Saturday May 09, 2009

DTrace and Performance Tools Seminar This Week - Atlanta, Ft. Lauderdale, Tampa

I'm doing briefings on DTrace and Solaris Performance Tools this week in Atlanta, Ft. Lauderdale, and Tampa.  Click the links below to register if this is of interest and you can attend.  These are pretty much a 2 1/2 to 3 hour briefing that stays pretty technical with lots of examples.  

From the flyer:

Join us for our next Solaris 10 Technology Brief featuring DTrace.  DTrace, Solaris 10's powerful new framework for system observability, helps system administrators, capacity planners, and application developers improve performance and problem resolution. 

DATE: May 12, 2009
LOCATION: Classroom Resource Group, Atlanta
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.crgatlanta.com/directions.asp
REGISTER AT: http://www.suneventreg.com/cgi-bin/pup_registration.pl?EventID=2705

HOLLYWOOD, FL - May 13, 2009
LOCATION: Seminole Hardrock Hotel
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.seminolehardrockhollywood.com/getting_here/directions.php
REGISTER: http://www.suneventreg.com/cgi-bin/pup_registration.pl?EventID=2706

TAMPA, FL - May 14, 2009
LOCATION:  University of South Florida
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.msc.usf.edu/directions.htm  
REGISTER:  https://www.suneventreg.com//cgi-bin/register.pl?EventID=2707

What You'll Learn?
You can't improve what you can't see and DTrace provides safe, production-quality, top to bottom observability - from the PHP application scripts down to the device drivers - without modifying applications or the system.  This seminar will introduce DTrace and the DTrace Toolkit as key parts of an overall Solaris performance and observability toolkit. 

AGENDA:
8:30 AM To 9:00 AM      Check In, Continental Breakfast
9:00 AM To 9:10 AM      Welcome
9:10 AM To 10:15 AM     Dtrace
10:15 AM To 10:30 AM    BREAK
10:30 AM To 11:30 AM    Dtrace Continued
11:30 AM To 12:00 PM    Wrap Up, Q&A, Evaluations

We look forward to seeing you at one of these upcoming Solaris 10 Dtrace sessions! 


Wednesday Apr 29, 2009

How do you use Jumpstart?

Jumpstart is the technology within Solaris that allows a system to be remotely installed across a network. This feature has been in the OS for a long, long time, dating to the start of Solaris 2.0, I believe. With Jumpstart, the system to be installed, the Jumpstart client, contacts a Jumpstart server to be installed across the network. This is a huge simplification, since there are nuances to how to set all of this up. Your best bet is to check the Solaris 10 Installation Guide: Network Based Installations and the Solaris 10 Installation Guide: Custom Jumpstart and Advanced Installations.

Jumpstart makes use of rules to decide how to install a particular system, based on its architecture, network connectivity, hostname, disk and memory capacity, or any of a number of other parameters. The rules select a profile that determines what will be installed on that system and where it will come from. Scripts can be inserted before and after the installation for further customization. To help manage the profiles and post-installation customization, Mike Ramchand has produced a fabulous tool, the Jumpstart Enterprise Toolkit (JET).

My Questions for You

As a long time Solaris admin, I have been a fan of Jumpstart for years and years. As an SE visiting many cool companies, I have seen people do really interesting things with Jumptstart. I want to capture how people use Jumpstart in the real world - not just the world of those who create the product. I know that people come up with new and unique ways of using the tools that we create in ways we would never imagine.

For example, I once installed 600 systems with SunOS 4.1.4 in less than a week using Jumpstart - remember that Jumpstart never supported SunOS 4.1.4.

But, I am not just looking for the weird stories. I want to know what Jumpstart features you use. I'll follow this up with extra, detailed questions around Jumpstart Flash, WAN Boot, DHCP vs. RARP. But I want to start with just some basics about Jumpstart.

Lacking a polling mechanism here at blogs.sun.com, you can just enter your responses as a comment. Or you can answer these questions at SurveyMonkey here. Or drop me a note at scott.dickson at sun.com.

  1. How do you install Solaris systems in your environment?
    1. I use Jumpstart
    2. I use DVD or CD media
    3. I do something else - please tell me about it
  2. Do you have a system for automating your jumpstart configurations?
    1. Yes, we have written our own
    2. Yes, we use JET
    3. Yes, we use xVM OpCenter
    4. No, we do interactive installations via Jumpstart. We just use Jumpstart to get the bits to the client.
  3. What system architectures do you support with Jumpstart?
    1. SPARC
    2. x86
  4. Do you use a sysidcfg file to answer the system identification questions - hostname, network, IP address, naming service, etc?
    1. No, I answer these interactively
    2. Yes, I hand-craft a sysidcfg file
    3. Yes, but it is created via the Jumpstart automation tools
  5. Do you use WANboot? I'll follow up with more questions on this at a later time.
    1. What's Wanboot?
    2. I have heard of it, but have never used it
    3. We rely on Wanboot
  6. Do you use Jumpstart Flash? More questions on this later, too
    1. Never heard of it
    2. We sometimes use Flash
    3. We live and breathe Flash
  7. What sort of rules do you include in your rules file?
    1. We do interactive installations and don't use a rules file
    2. We use the rules files generated by our automation tools, like JET
    3. We have a common rules file for all Jumpstarts based on hostname
    4. We use not only hostnames but also other parameters to determine which rule to use for installation
  8. Do you use begin scripts?
    1. No
    2. We use them to create derived profiles for installation
    3. We use them some other way
  9. Do you use finish scripts
    1. No
    2. We use the finish scripts created by our automation
    3. We use finish scripts to do some minor cleanup
    4. We do extensive post-installation customization via finish scripts. If so, please tell me about it.
  10. Do you customize the list of packages to be installed via Jumpstart?
    1. No
    2. Somewhat
    3. Not only do we customize the list of packages, but we create custom packages for our installation

Monday Apr 27, 2009

Just got my copy of Pro OpenSolaris

Just got my copy of Pro OpenSolaris by Harry Foxwell and Christine Tran in the mail today!  Can't wait to get a good look and post a review.  I wonder if I can get the authors to inscribe it to me!  

Also got a copy of OpenSolaris Bible by Nick Solter, Gerry Jelinek, and Dave Miner.  Looking forward into cracking into it as well.

Will post reviews shortly.

Tuesday Dec 23, 2008

A Much Better Way to use Flash and ZFS Boot

A Different Approach

A week or so ago, I wrote about a way to get around the current limitation of mixing flash and ZFS root in Solaris 10 10/08. Well, here's a much better approach.

I was visiting with a customer last week and they were very excited to move forward quickly with ZFS boot in their Solaris 10 environment, even to the point of using this as a reason to encourage people to upgrade. However, when they realized that it was impossible to use Flash with Jumpstart and ZFS boot, they were disappointed. Their entire deployment infrastructure is built around using not just Flash, but Secure WANboot. This means that they have no alternative to Flash; the images deployed via Secure WANBoot are always flash archives. So, what to do?

It occurred to me that in general, the upgrade procedure from a pre-10/08 update of Solaris 10 to Solaris 10 10/08 with a ZFS root disk is a two-step process. First, you have to upgrade to Solaris 10 10/08 on UFS and then use lucreate to copy that environment to a new ZFS ABE. Why not use this approach in Jumpstart?

Turns out that it works quite nicely. This is a framework for how to do that. You likely will want to expand on it, since one thing this does not do is give you any indication of progress once it starts the conversion. Here's the general approach:

  • Create your flash archive for Solaris 10 10/08 as you usually would. Make sure you include all the appropriate LiveUpgrade patches in the flash archive.
  • Use Jumpstart to deploy this flash archive to one disk in the target system.
  • Use a finish script to add a conversion program to run when the system reboots for the first time. It is necessary to make this script run once the system has rebooted so that the LU commands run within the context of the fully built new system.

Details of this approach

Our goal when complete is to have the flash archive installed as it always has been, but to have it running from a ZFS root pool, preferably a mirrored ZFS pool. The conversion script requires two phases to complete this conversion. The first phase creates the ZFS boot environment and the second phase mirrors the root pool. The following in this example, our flash archive is called s10u6s.flar. We will install the initial flash archive onto the disk c0t1d0 and built our initial root pool on c0t0d0.

Here is the Jumpstart profile used in this example:


install_type    flash_install
archive_location nfs nfsserver:/export/solaris/Solaris10/flash/s10u6s.flar
partitioning    explicit
filesys         c0t1d0s1        1024    swap
filesys         c0t1d0s0        free    /

We specify a simple finish script for this system to copy our conversion script into place:

cp ${SI_CONFIG_DIR}/S99xlu-phase1 /a/etc/rc2.d/S99xlu-phase1

You see what we have done: We put a new script into place to run at the end of rc2 during the first boot. We name the script so that it is the last thing to run. The x in the name makes sure that this will run after other S99 scripts that might be in place. As it turns out, the luactivate that we will do puts its own S99 script in place, and we want to come after that. Naming ours S99x makes it happen later in the boot sequence.

So, what does this magic conversion script do? Let me outline it for you:

  • Create a new ZFS pool that will become our root pool
  • Create a new boot environment in that pool using lucreate
  • Activate the new boot environment
  • Add the script to be run during the second phase of the conversion
  • Clean up a bit and reboot

That's Phase 1. Phase 2 has its own script to be run at the same time that finishes the mirroring of the root pool. If you are satisfied with a non-mirrored pool, you can stop here and leave phase 2 out. Or you might prefer to make this step a manual process once the system is built. But, here's what happens in Phase 2:

  • Delete the old boot environment
  • Add a boot block to the disk we just freed. This example is SPARC, so use installboot. For x86, you would do something similar with installgrub.
  • Attach the disk we freed from the old boot environment as a mirror of the device used to build the new root zpool.
  • Clean up and reboot.

I have been thinking it might be worthwhile to add a third phase to start a zpool scrub, which will force the newly attached drive to be resilvered when it reboots. The first time something goes to use this drive, it will notice that it has not been synced to the master drive and will resilver it, so this is sort of optional.

The reason we add bootability explicitly to this drive is because currently, when a mirror is attached to a root zpool, a boot block is not automatically installed. If the master drive were to fail and you were left with only the mirror, this would leave the system unbootable. By adding a boot block to it, you can boot from either drive.

So, here's my simple little script that got installed as /etc/rc2.d/S99xlu-phase1. Just to make the code a little easier for me to follow, I first create the script for phase 2, then do the work of phase 1.


cat > /etc/rc2.d/S99xlu-phase2 << EOF
ludelete -n s10u6-ufs
installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0s0
zpool attach -f rpool c0t0d0s0 c0t1d0s0
rm /etc/rc2.d/S99xlu-phase2
init 6
EOF
dumpadm -d swap
zpool create -f rpool c0t0d0s0
lucreate -c s10u6-ufs -n s10u6 -p rpool
luactivate -n s10u6
rm /etc/rc2.d/S99xlu-phase1
init 6

I think that this is a much better approach than the one I offered before, using ZFS send. This approach uses standard tools to create the new environment and it allows you to continue to use Flash as a way to deploy archives. The dependency is that you must have two drives on the target system. I think that's not going to be a hardship, since most folks will use two drives anyway. You will have to keep then as separate drives rather than using hardware mirroring. The underlying assumption is that you previously used SVM or VxVM to mirror those drives.

So, what do you think? Better? Is this helpful? Hopefully, this is a little Christmas present for someone! Merry Christmas and Happy New Year!

Friday Dec 05, 2008

Flashless System Cloning with ZFS

Ancient History

Gather round kiddies and let Grandpa tell you a tale of how we used to to clone systems before we had Jumpstart and Flash, when we had to carry water in leaky buckets 3 miles through snow up to our knees, uphill both ways.

Long ago, a customer of mine needed to deploy 600(!) SPARCstation 5 desktops all running SunOS 4.1.4. Even then, this was an old operating system, since Solaris 2.6 had recently been released. But it was what their application required. And we only had a few days to build and deploy these systems.

Remember that Jumpstart did not exist for SunOS 4.1.4, Flash did not exist for Solaris 2.6. So, our approach was to build a system, a golden image, the way we wanted to be deployed and then use ufsdump to save the contents of the filesystems. Then, we were able to use Jumpstart from a Solaris 2.6 server to boot each of these workstations. Instead of having a Jumpstart profile, we only used a finish script that partitioned the disks and restored the ufsdump images. So Jumpstart just provided us clean way to boot these systems and apply the scripts we wanted to them.

Solaris 10 10/08, ZFS, Jumpstart and Flash

Now, we have a bit of a similar situation. Solaris 10 10/08 introduces ZFS boot to Solaris, something that many of my customers have been anxiously awaiting for some time. A system can be deployed using Jumpstart and the ZFS boot environment created as a part of the Jumpstart process.

But. There's always a but, isn't there.

But, at present, Flash archives are not supported (and in fact do not work) as a way to install into a ZFS boot environment, either via Jumpstart or via Live Upgrade. Turns out, they use the same mechanism under the covers for this. This is CR 6690473.

So, how can I continue to use Jumpstart to deploy systems, and continue to use something akin to Flash archives to speed and simplify the process?

Turns out the lessons we learned years ago can be used, more or less. Combine the idea of the ufsdump with some of the ideas that Bob Netherton recently blogged about (Solaris and OpenSolaris coexistence in the same root zpool), and you can get to a workaround that might be useful enough to get you through until Flash really is supported with ZFS root.

Build a "Golden Image" System

The first step, as with Flash, is to construct a system that you want to replicate. The caveat here is that you use ZFS for the root of this system. For this example, I have left /var as part of the root filesystem rather than a separate dataset, though this process could certainly be tweaked to accommodate a separate /var.

Once the system to be cloned has been built, you save an image of the system. Rather than using flarcreate, you will create a ZFS send stream and capture this in a file. Then move that file to the jumpstart server, just as you would with a flash archive.

In this example, the ZFS bootfs has the default name - rpool/ROOT/s10s_u6wos_07.


golden# zfs snapshot rpool/ROOT/s10s_u6wos_07@flar
golden# zfs send -v rpool/ROOT/s10s_u6wos_07@flar > s10s_u6wos_07_flar.zfs
golden# scp s10s_u6wos_07_flar.zfs js-server:/flashdirectory

How do I get this on my new server?

Now, we have to figure out how to have this ZFS send stream restored on the new clone systems. We would like to take advantage of the fact that Jumpstart will create the root pool for us, along with the dump and swap volumes, and will set up all of the needed bits for the booting from ZFS. So, let's install the minimum Solaris set of packages just to get these side effects.

Then, we will use Jumpstart finish scripts to create a fresh ZFS dataset and restore our saved image into it. Since this new dataset will contain the old identity of the original system, we have to reset our system identity. But once we do that, we are good to go.

So, set up the cloned system as you would for a hands-free jumpstart. Be sure to specify the sysid_config and install_config bits in the /etc/bootparams. The manual Solaris 10 10/08 Installation Guide: Custom JumpStart and Advanced Installations covers how to do this. We add to the rules file a finish script (I called mine loadzfs in this case) that will do the heavy lifting. Once Jumpstart installs Solaris according to the profile provided, it then runs the finish script to finish up the installation.

Here is the Jumpstart profile I used. This is a basic profile that installs the base, required Solaris packages into a ZFS pool mirrored across two drives.


install_type    initial_install
cluster         SUNWCreq
system_type     standalone
pool            rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
bootenv         installbe bename s10u6_req

The finish script is a little more interesting since it has to create the new ZFS dataset, set the right properties, fill it up, reset the identity, etc. Below is the finish script that I used.


#!/bin/sh -x

# TBOOTFS is a temporary dataset used to receive the stream
TBOOTFS=rpool/ROOT/s10u6_rcv

# NBOOTFS is the final name for the new ZFS dataset
NBOOTFS=rpool/ROOT/s10u6f

MNT=/tmp/mntz
FLAR=s10s_u6wos_07_flar.zfs
NFS=serverIP:/export/solaris/Solaris10/flash

# Mount directory where archive (send stream) exists
mkdir ${MNT}
mount -o ro -F nfs ${NFS} ${MNT}

# Create file system to receive ZFS send stream &
# receive it.  This creates a new ZFS snapshot that
# needs to be promoted into a new filesystem
zfs create ${TBOOTFS}
zfs set canmount=noauto ${TBOOTFS}
zfs set compression=on ${TBOOTFS}
zfs receive -vF ${TBOOTFS} < ${MNT}/${FLAR}

# Create a writeable filesystem from the received snapshot
zfs clone ${TBOOTFS}@flar ${NBOOTFS}

# Make the new filesystem the top of the stack so it is not dependent
# on other filesystems or snapshots
zfs promote ${NBOOTFS}

# Don't automatically mount this new dataset, but allow it to be mounted
# so we can finalize our changes.
zfs set canmount=noauto ${NBOOTFS}
zfs set mountpoint=${MNT} ${NBOOTFS}

# Mount newly created replica filesystem and set up for
# sysidtool.  Remove old identity and provide new identity
umount ${MNT}
zfs mount ${NBOOTFS}

# This section essentially forces sysidtool to reset system identity at
# the next boot.
touch /a/${MNT}/reconfigure
touch /a/${MNT}/etc/.UNCONFIGURED
rm /a/${MNT}/etc/nodename
rm /a/${MNT}/etc/.sysIDtool.state
cp ${SI_CONFIG_DIR}/sysidcfg /a/${MNT}/etc/sysidcfg

# Now that we have finished tweaking things, unmount the new filesystem
# and make it ready to become the new root.
zfs umount ${NBOOTFS}
zfs set mountpoint=/ ${NBOOTFS}
zpool set bootfs=${NBOOTFS} rpool

# Get rid of the leftovers
zfs destroy ${TBOOTFS}
zfs destroy ${NBOOTFS}@flar

When we jumpstart the system, Solaris is installed, but it really isn't used. Then, we load from the send stream a whole new OS dataset, make it bootable, set our identity in it, and use it. When the system is booted, Jumpstart still takes care of updating the boot archives in the new bootfs.

On the whole, this is a lot more work than Flash, and is really not as flexible or as complete. But hopefully, until Flash is supported with a ZFS root and Jumpstart, this might at least give you an idea of how you can replicate systems and do installations that do not have to revert back to package-based installation.

Many people use Flash as a form of disaster recover. I think that this same approach might be used there as well. Still not as clean or complete as Flash, but it might work in a pinch.

So, what do you think? I would love to hear comments on this as a stop-gap approach.

Wednesday Oct 10, 2007

CEC Day One & Two Recap

So, CEC is actually almost over. It's been a whirlwind of sessions, meet-ups with folks, filling my head with new stuff. And, of course in the midst of all of the CEC excitement, there's still the need to keep up with what customers back home need.

So, what was exciting from days one and two? Lots!

 

  • Jon Haslam and Simon Ritter gave a great talk and demo about using DTrace along with Java. I am absolutely not a developer; never even written "Hello World" in Java. But, this really helped me understand how DTrace and Java are two great tastes that go great together. And with the newer JVMs, it really is a case of "Hey, you got your DTrace in my Java!", "No, you got your Java in my DTrace!" This all comes at a great time -- I have to do a presentation on Wednesday in Florida on exactly this topic.
  • Matt Ingenthron and Shanti gave a great talk about the various working parts and commonly used components and tools in a modern web infrastructure. Really helped me figure out how the pieces fit together.
  • Tim Cook had a great talk comparing the various file system offerings from Sun and others for OLTP workloads on large systems. He gave us some handy, simple, best practices for each and worked to bust some commonly held myths and misconceptions.
  • Tim Bray shared his perspective on what really is important about a Web 2.0 world, about how the things in that world can really matter to an enterprise. He talked about the fact that, end the end, time to market and managability are the overwhelming priorities for enterprises in selecting tools and techniques for application development and deployment. I am really inspired to go out and finally learn more about Ruby and Rails as a result.
Of course, there were more. These are just some of the highlights that come to mind quickly. As always, CEC was a great trip and well worth the effort (but I still dislike Las Vegas - a lot). And like Juan Antonio Samaranch at the Olypics, this CEC is about to be declared over, realized to be the best yet, and we will agree to meet again next year. I, for one am looking forward to it. Time to start working on a topic for my presentation!

Monday Oct 08, 2007

CEC2007 - Early Day One - Initial Impressions

So, it's the first day of CEC, Sun's Customer Engineering Conference. This year, there are about 4000 of us hanging out at Paris & Bally's hotels in Las Vegas. Systems Engineers, folks from Sun's various practices, Service & Support engineers, architects, folks from headquarters engineering are all here. But, we also have a huge number of our partners - resellers, OEMs, developers, etc.

Last night was our Networking Reception. Great to see folks again that I had not seen in a while and to meet lots of new faces.

Today, we start with opening sessions from Hal Stern, Dan Berg, Jim Baty, and a host of others. Then, we get into, for me, the guts of CEC - the breakout sessions. There are over 240 sessions, selected from a pool of over 700 submissions. I'm talking (Tuesday, 6PM, Versailles ballroom 3 & 4) on Dynamic Resource Pools in Solaris 10. I'll post my slides after the talk. If you are at the conference, come on over. I understand my talk will also be available in Second Life. I'm still trying to figure out how all of that works, though.

Here are some of my initial + and - observations from CEC so far:

  • Plus - Paris is great. Very lovely hotel. The look really captures all that you might remember and love from Paris.
  • Plus - I scored a deluxe room - corner room, view of the Bellagio fountains, windows on two walls.
  • Plus - Check-in logistics. Got through even the really long materials line in less than 10 minutes.
  • Plus - Networking Reception - Food was good and plentiful. Double plus for the desserts. Great to see folks. Last year, I missed the reception since I got in late.
  • Plus / Minus - In-room network. Fastest hotel network I have had in as long as I can recall. But it costs $13/day.
  • Minus - Room for meals was really, really, really crowded for breakfast. I can only imagine as folks try to rush through for lunch. And no sodas. Last year, folks finally got it that geeks often take their caffeine in a carbonated form.
  • Minus - Having the agenda only on-line via schedule builder has made it sort of inconvenient to select sessions, alter you plans, and pick new things on the fly. Same as last year. Sometimes paper really is useful.
  • Minus - Smoke - Las Vegas is smoky. Seems that they are managing it better now than in years past, but in these days of smoke-free public spaces, it's really noticeable.
  • Big Minus - For me, Las Vegas is absolutely not my top choice for a venue. For me, this is a very uncomfortable place. Maybe I'm just a stick in the mud or a prude or old in my thinking, but this town is just about too many things that really make me uncomfortable.

All in all, though I am excited about a great conference and expect to be really tired when I get home!

Web 3.0 - The Official Reason we need it

Jason Calacanis has posted his "official" definition of Web 3.0. He says "Web 3.0 is defined as the creation of high-quality content and services produced by gifted individuals using Web 2.0 technology as an enabling platform."

The same day I saw this, I also saw, on Keith Bostic's fabulous /dev/null mailing list, a link to Cracked.com's The 8 Most Needlessly Detailed Wikipedia Entries. Even though all of these folks are clearly authorities in their field, are we really getting the "wisdom of the crowd"? Geek and Poke gets it pretty right.

ATLOSUG - Oct 9. Cancelled

Sun's Customer Engineering Conference is going on this week in Las Vegas. As a result, we've had to cancel our October meeting of ATLOSUG - We're all in Las Vegas.

Sorry for the inconvenience. We will pick up with our meetings in November. Ryan Matteson, from Ning, will be our speaker. Should be a really good meeting. Details on the topic to follow.

Tuesday Jun 26, 2007

I've been everywhere man, I've been everywhere

I feel like that Johnny Cash song (which I think maybe Jimmy Rodgers did first - can't recall for sure).  Seems like for the last several months, I've been on the road doing Solaris bootcamps, best practices workshops, and all sorts of other things Solaris.  I've seen a lot of interesting places and met lots of interesting folks.  Just the last few weeks, I've been to:
  • Bismarck, ND, Sioux Falls, SD, Fargo, ND for University Solaris Bootcamps.  Got to see lots of that area driving from one to another across the secondary highways.  Thanks to Greg Stromme from Applied Engineering, Sun's reseller partner in that geography, for driving me and showing places I'd never been before.  We saw the homeplaces of Lawrence Welk and Laura Ingalls Wilder, plus lots of wide-open territory
  • Conway, Arkansas for Solaris resource management workshop.  Got to see a cousin in Russellville this trip.
  • Austin, Texas for Solaris virtualization workshop.
  • Baton Rouge, LA for University Solaris bootcamp - Got to see a cousin here, too
  • Huntsville, AL for various Solaris briefings
And that's just the last six weeks!  I'm kind of thankful for the end of the quarter and the year coming up.  I have no tickets booked until the end of July right now!


Powered by ScribeFire.

Wednesday Jan 24, 2007

I guess it's a good thing....

I am amazed and awed by all of the folks on BSC who are able to contribute great content \*and\* get their jobs done!  I find that even when I want to share something, there just don't seem to be enought hours in the day to get the job done, talk to & support the customers, and then to put something together that makes enough sense to share.

 How do you guys do it?  Or do you never sleep?
 

Friday Dec 01, 2006

Fun with zvols - UFS on a zvol

Continuing with some of the ideas around zvols, I wondered about UFS on a zvol.  On the surface, this appears to be sort of redundant and not really very sensible.  But thinking about it, there are some real advantages.

  • I can take advantage of the data integrity and self-healing features of ZFS since this is below the filesystem layer.
  • I can easily create new volumes for filesystems and grow existing ones
  • I can make snapshots of the volume, sharing the ZFS snapshot flexibility with UFS - very cool
  • In the future, I should be able to do things like have an encrypted UFS (sort-of) and secure deletion

Creating UFS filesystems on zvols

Creating a UFS filesystem on a zvol is pretty trivial.  In this example, we'll create a mirrored pool and then build a UFS filesystem in a zvol.

bash-3.00# zpool create p mirror c2t10d0 c2t11d0 mirror c2t12d0 c2t13d0
bash-3.00# zfs create -V 2g p/v1
bash-3.00# zfs list
NAME     USED  AVAIL  REFER  MOUNTPOINT
p       4.00G  29.0G  24.5K  /p
p/v1    22.5K  31.0G  22.5K  -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
newfs: construct a new file system /dev/zvol/rdsk/p/v1: (y/n)? y
Warning: 2082 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/p/v1:    4194270 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 43 cyl groups (16 c/g, 48.00MB/g, 11648 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
3248288, 3346720, 3445152, 3543584, 3642016, 3740448, 3838880, 3937312,
4035744, 4134176
bash-3.00# mkdir /fs1
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   2.0M   1.9G     1%    /fs1

Nothing much to it. 

Growing UFS filesystems on zvols

But, what if I run out of space?  Well, just as you can add disks to a volume and grow the size of the volume, you can grow the size of a zvol.  Now, since the UFS filesystem is a data structure inside zvol container, you have to grow it as well.  Were I using just zfs, the size of the file system would grow and shrink dynamically with the size of the data in the file system.  But  a UFS has a fixed size, so it has to be expanded manually to accomodate the enlarged volume.  Now, this seems to have quite working between b45 and b53, so I just filed a bug on this one.

bash-3.00# uname -a
SunOS atl-sewr-158-154 5.11 snv_45 sun4u sparc SUNW,Sun-Fire-480R
bash-3.00# zfs create -V 1g bsd/v1
bash-3.00# newfs /dev/zvol/rdsk/bsd/v1
...
bash-3.00# zfs set volsize=2g bsd/v1
bash-3.00# growfs /dev/zvol/rdsk/bsd/v1
Warning: 2048 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/bsd/v1:  4194304 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 49 cyl groups (14 c/g, 42.00MB/g, 20160 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 86176, 172320, 258464, 344608, 430752, 516896, 603040, 689184, 775328,
3359648, 3445792, 3531936, 3618080, 3704224, 3790368, 3876512, 3962656,
4048800, 4134944

What about compression? 

Along the same lines as growing the file system, I suppose you could turn compression on for the zvol.  But since the UFS is of fixed size, it won't help especially, as far as fitting more data in the file system.  You can't put more into the filesystem than the filesystem thinks that it can hold.  Even if it isn't using that much on the disk.  Here's a little demonstration of that.

First, we will loop through, creating 200MB files in a 1GB file system with no compression.  We will use blocks of zeros, since these will compress quite a bit the second time round. 

bash-3.00# zfs create -V 1g p/v1
bash-3.00# zfs get used,volsize,compressratio p/v1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           22.5K    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
...
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00#
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs1/$f
> df -h /fs1
> zfs get used,volsize,compressratio p/v1
> done

200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   201M   703M    23%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           62.5M    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   401M   503M    45%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           149M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   601M   303M    67%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           377M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   801M   103M    89%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           497M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 14:53:04 atl-sewr-158-122 ufs: NOTICE: alloc: /fs1: file system full

bash-3.00# zfs get used,volsize,compressratio p/v1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           1.00G    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
bash-3.00#

So, you see that it fails as it writes the 5th 200MB chunk, which is what you would expect.  Now, let's do the same thing with compression turned on for the volume.

bash-3.00# zfs create -V 1g p/v2
bash-3.00# zfs set compression=on p/v2
bash-3.00# newfs /dev/zvol/rdsk/p/v2
...
bash-3.00#
bash-3.00# mount /dev/zvol/dsk/p/v2 /fs2
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs2/$f
> df -h /fs2
> zfs get used,volsize,compressratio p/v2
> done
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   201M   703M    23%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   401M   503M    45%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   601M   303M    67%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   801M   103M    89%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 15:16:42 atl-sewr-158-122 ufs: NOTICE: alloc: /fs2: file system full

bash-3.00# zfs get used,volsize,compressratio p/v2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           9.54M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.07x    -
bash-3.00# df -h /fs2
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   962M     0K   100%    /fs2
bash-3.00#

This time, even though the volume was not using much space at all, the file system was full.  So compression in this case is especially valuable from a space management standpoint.  Depending on the contents of the filesystem, compression may still help the performance by converting multiple I/Os into single or fewer I/Os, though.

The Cool Stuff - Snapshots and Clones with UFS on Zvols

One of the things that is not available in UFS is the ability to create multiple snapshots quickly and easily.  The fssnap(1M) command allows me to create a single, read-only snapshot of a UFS file system.  In addition, it requires an additional location to maintain backing store for files changed or deleted in the master image during the lifetime of  the snapshot.

ZFS offers the ability to create many snapshots of a ZFS filesystem quickly and easily.  This ability extends to zvols, as it turns out.

For this example, we will create a volume, fill it up with some data and then play around with taking some snapshots of it.  We will just tar over the Java JDK so there are some files in the file system. 

bash-3.00# zfs create -V 2g p/v1
bash-3.00# newfs /dev/zvol/rdsk/p/v1
...
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# tar cf -  ./jdk/ | (cd /fs1 ; tar xf - )
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1
bash-3.00# zfs list
NAME     USED  AVAIL  REFER  MOUNTPOINT
p       4.00G  29.0G  24.5K  /p
p/swap  22.5K  31.0G  22.5K  -
p/v1     531M  30.5G   531M  -

Now, we will create a snapshot of the volume, just like for any other ZFS file system.  As it turns out, this creates new device nodes in /dev/zvol for the block and character devices.  We can mount them as UFS file systems same as always.

bash-3.00# zfs snapshot p/v1@s1  # Make the snapshot
bash-3.00# zfs list # See that it's really there
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /fs1-s1
bash-3.00# mount  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount it
mount: /dev/zvol/dsk/p/v1@s1 write-protected # Snapshots are read-only, so this fails
bash-3.00# mount -o ro  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount again read-only
bash-3.00# df -h /fs1-s1 /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1@s1
                       1.9G   431M   1.5G    23%    /fs1-s1
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1
bash-3.00#

At this point /fs1-s1 is a read-only snapshot of /fs1.  If I delete files, create files, or change files in /fs1, that change will not be reflected in /fs1-s1.

bash-3.00# ls /fs1/jdk
instances    jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm -rf /fs1/jdk/instances
bash-3.00# df -h /fs1 /fs1-s1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
/dev/zvol/dsk/p/v1@s1
                       1.9G   431M   1.5G    23%    /fs1-s1
bash-3.00#

Just as you can create multiple snapshots.  And as with any other ZFS file system, you can rollback a snapshot and make it the master again.  You have to unmount the filesystem in order to do this, since the rollback is at the volume level.  Changing the volume underneath the UFS filesystem would leave UFS confused about the state of things.  But, ZFS catches this, too.

 

bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm /fs1/jdk/jdk1.6.0
bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  latest       packages
bash-3.00# zfs list
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      535M  30.5G   531M  -
p/v1@s1  4.33M      -   531M  -
bash-3.00# zfs rollback p/v1@s2 # /fs1 is still mounted.
cannot remove device links for 'p/v1': dataset is busy
bash-3.00# umount /fs1
bash-3.00# zfs rollback p/v1@s2
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00#

I can create additional read-write instances of a volume by cloning the snapshot.  The clone and the master file system will share the same objects on-disk for data that remains unchanged, while new on-disk objects will be created for any files that are changed either in the master or in the clone.

 

bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# zfs snapshot p/v1@s1
bash-3.00# zfs clone p/v1@s1 p/c1
bash-3.00# zfs list
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/c1         0  29.0G   531M  -
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /c1
bash-3.00# mount /dev/zvol/dsk/p/c1 /c1
bash-3.00# ls /c1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# df -h /fs1 /c1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
/dev/zvol/dsk/p/c1     1.9G    61M   1.8G     4%    /c1
bash-3.00#

I think am pretty sure that this isn't exactly what the ZFS guys had in mind when they set out to build all of this, but this is pretty cool.  Now, I can create UFS snapshots without having to specify a backing store.  I can create clones, promote the clones to the master, and the other things that I can do in ZFS.  I still have to manage the mounts myself, but I'm better off than before.

I have not tried any sort of performance testing on these.  Dominic Kay has just written a nice blog about using filebench to compare ZFS and VxFS.  Maybe I can use some of that work to see how things go with UFS on top of ZFS.

As always, comments, etc. are welcome!

Fun with zvols - Swap on a zvol

I mentioned recently that I just spent a week in a ZFS internals TOI. Got a few ideas to play with there that I will share. Hopefully folks might have suggestions as to how to improve / test / validate some of these things.

ZVOLs as Swap

The first thing that I thought about was using ZFS as a swap device. Of course, this is right there in the zfs(1) man page as an example, but it still deserves a mention here.  There has been some discussion of this on the zfs-discuss list at opensolaris.org (I just retyped that dot four times thinking it was a comma. Turns out there was crud on my laptop screen).  The dump device cannot be on a zvol (at least if you want to catch a crash dump) but this still gives a lot of flexibility.  With root on ZFS (coming before too long) ZFS swap makes a lot of sense and is the natural choice. We were talking in class that maybe it would be nice if there were a way to turn off ZFS' caching for the swap surface to improve performance, but that remains to be seen.

At any rate, setting up mirrored swap with ZFS is way simple! Much simpler even than with SVM, which in turn is simpler than VxVM. Here's all it takes:


bash-3.00# zpool create -f p mirror c2t10d0 c2t11d0
bash-3.00# zfs create -V 2g p/swap
bash-3.00# swap -a /dev/zvol/dsk/p/swap

Pretty darn simple, if you ask me. You can make it permanent by changing the lines for swap in your /etc/vfstab (below).  Notice that you use the path to the zvol in the /dev tree rather than the ZFS dataset name.


bash-3.00# cat /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
#/dev/dsk/c1t0d0s1 - - swap - no -
/dev/zvol/dsk/p/swap - - swap - no -

I would like to do some performance testing to see what kind of performance you can get with swap on a zvol.  I am curious about how this will affect kernel memory usage.  I am curious about the effect of things like compression on the swap volume.  Thinking about that one, it doesn't make a lot of sense.  I am also curious about the ability to dynamically change the size of the swap space.  At first glance, changing the size of the volume does not automatically change the amount of available swap space.  That makes sense.  That makes sense for expanding swap space.  But if you reduce the size of the volume and the kernel doesn't notice, that sounds like a it could be a problem.  Maybe I should file a bug.

Suggestions for things to try and ways to measure overhead and performance for this are welcomed.

Thursday Nov 30, 2006

ZILs and ZAPs and ARCs - Oh My!

I just spent the last four days in a ZFS Intenals TOI, given by George Wilson from RPE.  This just reinforces my belief that the folks who build OpenSolaris (and most any complex software product, actually) have a special gift.  How one can conceive of all of the various parts and pieces to bring together something as cool as OpenSolaris or ZFS or DTrace, etc., is beyond me.

By way of full disclosure, I ought to admit that the main thing I learned in graduate school and while working as a developer in a CO-OP job at IBM was that I hate development.  I am not cut out for it and have no patience for it.

Anyway, though, spending a week in the ZFS source actually helps you figure out how to best use the tool at a user level.  You how things fit together and this helps to figure out how to build solutions.  I got a ton of good ideas about some things that you might do with ZFS even without moving all of your data to ZFS.  Don't know whether they will pan out or not, but some ideas to play around with.  More about that later.

Same kind of thing applies for internals of the kernel.  Whether or not you are a kernel programmer, you can be a better developer and a better system administrator if you have a notion of how the pieces of the kernel fit together.  Sun Education is now offering a class called Solaris 10 Operating System, previously only offered internally at Sun.  Since Solaris has been open-sourced, the internal Internals is now and external Internals!  If you have a chance, take this class!  I take it every couple of Solaris releases and never regret it.

But, mostly I want to say a special thanks to George Wilson and the RPE team for putting together a fantastic training event and for allowing me, from the SE / non-developer side of the house to sit in and bask in the glow of those who actually make things for a living.

Tuesday Oct 03, 2006

CEC Day 1

Got into SFO Sunday evening and went straightaway to the reception at the Hilton. It's always great to see the folks that you have worked with over the years and don't get to see very often. Networking is as important as anything else at these events. If social networking is important in a Web 2.0 world, it only got that way because social networking in our day to day life is how we get stuff accomplished.

Laura Ramsey and I hosted an OpenSolaris BOF at lunchtime. We had a pretty good crowd and had short, "lightning talks" from a number of folks

  • Ienup Sung talked about Internationalization and Localization in OpenSolaris
  • Ken Drachnik talked about Glassfish
  • Jeff Savit talked about some cool stuff going on with ports of OpenSolaris to "alternative platforms." More on this as it is ready for prime time.
  • Iwan Rahabok talked about Singanix and the OpenSolaris user group in Singapore
  • Bruno Gillet talked about how he uses OpenSolaris as a tool to demonstrate new features that will appear in Solaris and how important OpenSolaris can be to Sun's engineers as a day-to-day too.

After lunch, the breakout sessions began, the real reason we come to CEC.

  • I heard Jim Mauro talk (and make himself tired in a mad dash through more slides than minutes) on Solaris POD - Performance, Observability, and Debugging - tools.
  • I saw a talk about new features in Sun Cluster 3.2 that make upgrades of not only the cluster, but also the OS and application easier and with less interruption in service.
  • I saw two good talks on ZFS. One by Detlef Drewanz and Constantin Gonzalez on how they use ZFS and some of the reliability metrics around various configurations of disks. And one by Roch Bourbonais about some of the ZFS implementation details. That one just whet my appetite for a week long deep dive into ZFS internals I hope to attend in December.

Now, it's time to start again. Andy B. is on tap today for the general session. I plan to hear Richard Elling talk about RAS for sure, but I don't know what else. Busy, Busy, Busy!!

Technorati:

This CEC thing will wear you out!

I had these grand intentions of writing some sort of brilliant synopsis of all the amazing things I had seen during the first day of CEC, but this CEC stuff will wear you right out! Long, long days full of firehoses to the brain make you just want to sleep. Couple that with the fact that the folks back home on east-coast time still want you do do your day job and be on con calls convenient to them. 5am and 6am calls start the day, followed by sessions until 7pm. The the after-session stuff goes on until late. Then you collapse in your room. But, wait... Didn't you commit to putting together a slide deck and sending it out before tomorrow morning. Gotta get that done. Finally to bed after 12, only to start again at 5am.....

It's a great opportunity to be here at CEC. Tons to see and hear and learn about that I can take home and use directly, but this CEC thing will wear you out!

Technorati:

Saturday Sep 30, 2006

CEC 2006, Here I Come!

I'm on my way, like so many others at Sun, to CEC 2006 on Sunday. Sounds like there will be nearly 3000 Sun engineers and architects from around the world convening at the Moscone Center in San Francisco. This year is the first time that I have attended without presenting a paper. Maybe I'll get to see more presentations this way!

One of the highlights of CEC is the many BOFs - Birds of a Feather sessions. Laura Ramsey and I are hosting a BOF for OpenSolaris on Monday over lunch. Plan is to have several Lightning Talks - 5-8 minute, very brief presentations on a variety of topics. We've got Lightning Talks lined up on Security, Trusted Extensions, I18N & L10N, Glassfish, and a bunch of other stuff. Shame we have only about an hour for the meeting. If you are CEC and are looking for a BOF to attend on Monday, try the OpenSolaris one!

Also like others, I am combining CEC with an Ambassador meeting, but for me it's OS Ambassadors rather than DC Ambassadors. 50 or so of us from around the world who focus on Solaris will get together with Solaris engineering and marketing. It's always a great meeting and a good time to see folks that you don't see very frequently.

So, look for a few more blog entries here on things that I see that might be interesting to pass along.

Technorati Tags: cec2006

Monday Sep 18, 2006

Thanks, Ryan, for a great OpenSolaris User Group meeting

The Atlanta OpenSolaris User Group met last Tuesday and it has taken me a week to get my head above water to mention it. Ryan Matteson from Earthlink, battling a nasty cold, did a great job. His presentation was on Brendan Gregg's DTrace Toolkit and how system administrators can make good use of DTrace. His slides are on his blog here on prefetch.net. We ended up with about 25 people for this meeting.

Thanks to our sponsor for this meeting, Forsythe Systems, for providing refreshments.

The next meeting of the Atlanta OpenSolaris User Group will be Tuesday, Oct. 10, in the Sun office in Alpharetta, GA. More details are here.

Thursday Sep 07, 2006

Atlanta OpenSolaris User Group - September 12

The next meeting of the Atlanta OpenSolaris User Group will be Tuesday, Sept 12, at 7 PM in the offices of Sun Microsystems. Sun is located at 3655 North Point Parkway in Alpharetta, GA. For directions and details, see the ATLOSUG web site at http://www.opensolaris.org/os/community/os_user_groups/atl-osug/

The topic for this meeting will be the DTrace Toolkit. Ryan Matteson, of Earthlink, will present. The DTrace Toolkit is a collection of tools built on top of DTrace for system and application monitoring and observation.

Please RSVP to Scott.Dickson@sun.com if you plan to attend. We need to have at least a rough count for refreshments.

I've built it. but now what?

ZFS on a box like the SunFire X4500 is way cool. But what if all you have is old, controller-based storage devices? George Wilson and I were wondering about that and thought it might be useful to do some experimentation down that line. So, we collected all of the currently unused storage in my lab and built a big ZFS farm. We've got a V480 with 7 T3B and 8 T3 bricks connected via Sanbox-1 switches, along with a couple of TB of IBM Shark storage recabled to be JBOD. I have a 3510 and maybe some Adaptec RAID storage that I can hook up eventually.

So, the server is up and running with a 3 racks of storage, keeping the lab nice and toasty. Now what?!

What might be the best way to manage the T3s in a ZFS world? As a first pass, I split each brick into 2 RAID5 LUNs with a shared spare drive. But, maybe I would be better off just creating a single stripe with no RAID in the T3 and let ZFS handle the mirroring. But, I've had a number of disk errors (these are all really, really, really old) that the T3 fixed on its own w/o bothering the ZFS pool. Maybe RAID5 in the brick is the right approach. I could argue either way.

Feel free to share your suggestions on what might be a good configuration here and why. I'm happy to test out several different approaches.

Wednesday Jul 12, 2006

Atlanta OpenSolaris User Group - July 12

The Atlanta OpenSolaris User Group had a great monthly meeting last night Alok Aggarwal presented on NFSv4 to a group of about 15 and fielded quite a lot of questions ranging from how NFSv4 works to how to use the DTrace provider for NFSv4. Good meeting. Check http://opensolaris.org/os/community/os_user_groups/atl-osug/ for slides and meeting details.

No cake and pictures this month, but a big Thank-You to Intelligent Technology Systems for sponsoring us this month and bringing the pizza.

Our next meeting will be August 8 in the Alpharetta, Georgia Sun office. Check the ATLOSUG web site for details and directions.

Wednesday Jun 14, 2006

Hotlanta Celebrates One Year of OpenSolaris


The Atlanta OpenSolaris User Group launched a bit of an early birthday celebration for our good friend, OpenSolaris, last night with a rousing meeting. George Wilson, from Sun's Revenue Products Engineering group, gave us an update on what's new in ZFS lately. I have to say that I am more and more impressed with the things that you can and will be able to do with ZFS. George and I were talking about how one might use promotion of cloned ZFS filesystems as a part of a Q/A and patching process, especially for zones sitting in a ZFS filesystem. I am not yet sure of exactly how all of this might work, but I think it has promise.


George also talked about using ZFS for the root filesystem and booting from a ZFS filesystem. Also very cool. Seems to me like this has a lot of benefits. You never will have to go through the pain of backing up and restoring a root drive to resize /var or /opt! Plus, you get the added safety and security of ZFS. Old-timers who want to see a root disk that looks like a simple disk may have to rethink things a little, but I think the added benefits will outweigh the effort of change.


After George's talk, I took the stage and talked about integrating Zones and ZFS. I'm pretty excited about this. On the one hand, being able to use ZFS to provide application data space to a zone allows the zone administrator to take on the responsibility of managing their own filesystems to fit their needs, without bothering the global platform adminstrator. On the other hand, using ZFS for the zoneroot, I can easily and quickly create new zones, cloning them from a master, using ZFS properties to keep them from stomping on one another. All very cool. I have to congratulate the whole ZFS team (and the Zones team).


I am looking forward to our next meeting - July 11 - when we will hear from Alok Aggarwal on NFSv4. We got a good list of suggested topics that should keep us going through the fall.

Friday Jun 09, 2006

Atlanta OpenSolaris User Group - June 13

The Atlanta OpenSolaris User Group is having its next meeting on Tuesday, June 13, at 7:00 PM in the Alpharetta, GA Sun Office. Details and directions can be found at http://opensolaris.org/os/community/os_user_groups/atl-osug/

Our speakers for this meeting will be George Wilson from Sun's engineering group talking about ZFS as a Root Filesystem, and Scott Dickson talking about integrating ZFS and Zones.

Come out and help us celebrate the 1st birthday of OpenSolaris!

Monday May 15, 2006

It's neat, but is it useful?

Sometimes weird ideas occur to me while I'm on airplanes. The other day, while flying to a customer engagement, I was thinking about the fact that customers often ask about how to manage usernames and passwords between the global zone and non-global zones in Solaris 10. Certainly, you can use a centrally managed solution such as LDAP or NIS, but many of these customers don't have anything like that. Moreover, they only have a few users on any particular system and want all of the users in the global zone to be known in the non-global zones as well.


So, this got me to thinking. What if we use loopback mounts for things like /etc/passwd and /etc/shadow? Hey, yeah! That's the ticket! That might work! If I make a readonly mount of these files, I bet I can access them in the non-global zone. If I make then read-only, they end up being managed from the global zone, and less likely to be a security problem.


And what about /etc/hosts? Well, probably there's DNS, but not necessarily. I have customers who have 50,000+ line host files. They would love to share these, too. So, why not mount /etc/inet while we're at it?


Here's what I did. I have a zone called z4 whose zoneroot is located at /zones/z4. I had already created this zone previously, so I will just use zonecfg to make some modifications to the existing zone:



global# mv /zones/z4/root/etc/passwd /zones/z4/root/etc/passwd.safe
global# mv /zones/z4/root/etc/shadow /zones/z4/root/etc/shadow.safe
zonecfg -z z4
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/passwd
zonecfg:z4:fs> set special=/etc/passwd
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/shadow
zonecfg:z4:fs> set special=/etc/shadow
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/inet
zonecfg:z4:fs> set special=/etc/inet
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> verify
zonecfg:z4> commit
zonecfg:z4> \^D

When I boot up the zone and take a look at what's mounted, I now see this:



# uname -a
SunOS z4 5.10 Generic_Patch i86pc i386 i86pc
# zonename
z4
# df -h
Filesystem             size   used  avail capacity  Mounted on
/                      5.9G   3.5G   2.3G    61%    /
/dev                   5.9G   3.5G   2.3G    61%    /dev
/etc/inet              5.9G   3.5G   2.3G    61%    /etc/inet
/etc/passwd            5.9G   3.5G   2.3G    61%    /etc/passwd
/etc/shadow            5.9G   3.5G   2.3G    61%    /etc/shadow
/lib                   5.9G   3.5G   2.3G    61%    /lib
/opt                   3.9G   1.6G   2.3G    42%    /opt
/platform              5.9G   3.5G   2.3G    61%    /platform
/sbin                  5.9G   3.5G   2.3G    61%    /sbin
/usr                   5.9G   3.5G   2.3G    61%    /usr
proc                     0K     0K     0K     0%    /proc
ctfs                     0K     0K     0K     0%    /system/contract
swap                   1.5G   240K   1.5G     1%    /etc/svc/volatile
mnttab                   0K     0K     0K     0%    /etc/mnttab
/usr/lib/libc/libc_hwcap2.so.1
                       5.9G   3.5G   2.3G    61%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                   1.5G     0K   1.5G     0%    /tmp
swap                   1.5G    16K   1.5G     1%    /var/run

Now, I can log directly into the zone using the same username and password as the global zone. This seems like it could be pretty cool. /etc/passwd, /etc/shadow, and /etc/inet are all mount points from the global zone.I am not sure that it's really useful. What does anyone else think? Is this a technique that should be strongly discouraged? Or something that we need to document and encourage?


One thing that this makes me think of is a potential RFE for zonecfg. It would be nice to be able to somehow have an include operator, so that you can pull in common segments to be added to each zone configuration. But maybe the right way to do this is to just do this in a script.


Thoughts? Comments?

Saturday May 06, 2006

Atlanta OpenSolaris User Group - May 9

I can't believe that I've let things go so far away from me that my last post was in November. Here it is May already! Lots of news from ATLOSUG (Atlanta OpenSolaris User Group). Since November, we have had a couple of meetings in January and March, moved around trying to find a better venue.

The last meeting was a great overview of ZFS, given by George Wilson, one of the engineers involved in the port of ZFS from Nevada back to Solaris 10. This was the first meeting held at the Sun office in Alpharetta, Georgia. Had a great turnout. Lots of discussion and questions. We could have gone on for another hour or more. Expect to hear more from George on ZFS in the coming months.


The next meeting of ATLOSUG will be May 9 at 7PM at the Sun office. Check the ATLOSUG site for directions and details. Matrix Resources is sponsoring this meeting for us, and we thank them for their support (and for the refreshments!). Our topic for this meeting is BrandZ - Running Linux applications in a Solaris zone. Expect to see some slides, and then a bunch of demos of how to build, install branded zones, running applications in zones, and then some cool interactions of zones and ZFS, zones and DTrace.


Another piece of news regarding the ATLOSUG, starting with the May meeting, we are altering the schedule to meet monthly rather bimonthly. Seems like there is enough going on and enough people interested to keep us going at that pace. So, the next meeting will be June 12, at the Sun office.


Regarding the location, admittedly, in the Atlanta area, meeting locations are a challenge. As a stand-alone user group, we need a meeting location that doesn't cost very much, is accessible in the evening, and is as convenient to some large portion of the city as possible. Meeting downtown or midtown is inconvenient to many folks on the north side. Meeting on the north side (at the Sun office, for example) makes attendance near impossible for ITP folks. Clearly, around the Perimeter is the best bet, but everything we have found so far is expensive. So, if you have an idea for a location on the top end of the perimeter that's cheap, accessible, and available, please let me know.


And to anyone in Atlanta, we look forward to seeing you on May 9!

Wednesday Nov 09, 2005

All I can say is Wow!

The Atlanta OpenSolaris User Group kicked off last night with just over 50 attendees! There were about 30 who had signed up beforehand, and I would have been happy with 20 for this first meeting. I was floored that we had SRO. All the food was gone; all the soda was gone; all the shirts were gone! Scott & George demo OpenSolaris

The crowd braved the fierce Atlanta traffic to convene at the Crowne Plaza Ravinia hotel. Our future meetings will be held on campus at Georgia Tech, where we hope that students will get involved with OpenSolaris. As it turns out, the Atlanta Telecom Professionals were having their annual awards Gala at the same hotel, so it really was a case of braving the crowds and traffic.

But, just over 50 people turned out from all over town. Customers, iForce partners, recruiters, integrators, universities, commercial, Sun engineers - all sorts of folks.

As this was an organizational event, we talked about meeting mechanics, frequency, etc. As I said, our future meetings, held the 2nd Tuesday of odd-numbered months, will be in the Georgia Tech Student Center in mid-town Atlanta at 7:00PM, with networking and refreshments starting around 6:30. We're taking a lesson from the venerable Atlanta Unix Users Group and not trying to get complicated or fancy in our structure. Each meeting will include time for discussion, Q&A, and a presentation. We invite partners to sponsor meetings and help defray the cost of the refreshments, etc.

Our kickoff presentation was an overview of OpenSolaris. Much thanks to Jerry Jelinek, whose slides provided a lot of background. You can find a recap of the meeting with photos and the slides here.

I think we're off to a great start! We have sponsors fighting over who gets to sponsor upcoming meetings, and we have speakers volunteering for most of the next year already! We may have to meet more frequently to get the speakers in.

Thanks so much to the folks who have been a great help, and will continue to be a great help - Crystal Nichols from Intelligent Technology Systems for covering logistics, and to George Wilson and Don Deal from Sun's Sustaining Engineering group for technical backup.

We'll see everyone at the next meeting on January 10!

Tuesday Oct 18, 2005

Atlanta OpenSolaris User Group Launches

We are kicking off an OpenSolaris User Group in the Atlanta area. Several customers have asked me about whether such a thing existed. Since it didn't, we're starting one! The first meeting will be on Tuesday, November 8, 7:00-9:00 PM, at the Crowne Plaza Ravinia hotel in Atlanta.

Subsequent meetings will be on the campus of Georgia Tech, in the Student Center (the room was already booked for the first meeting). We will meet on the second Tuesday of January, March, May, July, September, and November (every other month).

Both the web site and email discussion list are live already.

For the kickoff meeting, I will talk about OpenSolaris, what it is, and how to get involved. But we hope to have community members present at many of our subsequent meetings. Ryan Matteson from Earthlink has already volunteered to speak about an article he wrote on the DTrace Toolkit.

We hope that anyone in the Atlanta area interested in OpenSolaris, as well as the commercially distributed Solaris, will come by help us get a great start!

Saturday Jul 23, 2005

Dixie is excited about Solaris 10!

It's been a long time since my last blog. I feel like I have been on the road now for months, travelling about the South, talking about Solaris. Every where I go, folks are excited about Solaris 10, Open Solaris, Solaris running on x86 / x64. After being an OS Ambassador at Sun for 10 years, it's finally fashionable to focus on the OS. And that's a lot of fun.

Many of us OS Ambassadors have been presenting roughly six hours of Solaris 10 technical overview at a series of Solaris 10 Boot Camps held across the country. If one is in your area, try to take it in. Even if you are already a Solaris junkie, this is a great way to meet other folks in your area also interested in S10. For the most part, these events are hosted by colleges and universities and held on-campus, but are open to the community at large. I've been doing these in Florida, so far, and plan to travel to Mississippi for one next week. Who would have thought that there would be excitement for Solaris 10 in Mississippi?! But, we have between 50 and 100 people signed up in Jackson, MS for our event. There are a couple of these events planned for the West Coast (San Diego and Santa Clara, I think), as well as Atlanta, in the next month or so.

On top of the Boot Camps, I have visited dozens of customers, both very, very large, and very, very small, and everything in-between. Even if the customer is not currently running Solaris, Solaris 10, especially running on x86 hardware, is something that \*they call us to hear about\*! I've visited customers in Georgia, Tennessee, Florida, Alabama, Virginia in the last couple of months and the reception has always been the same - "This is way cool!"

It looks like my task for the next year will be focused around helping customers get Solaris 10 integrated into their environments. That's a pleasant task, to my way of thinking.

It's times like this that continue to make me happy to be at Sun and happy to be associated with Solaris and the people who make Solaris possible.

Monday Feb 28, 2005

CEC Final Morning

Monday morning was the final half-day of the Sun Customer Engineering Conference. We rounded out this morning with two breakout sessions and some time chatting with Scott McNealy. The breakouts I made this morning were particularly good. I started with Liane Praza giving a talk about SMF for Administrators. SMF is one of the particularly powerful new sets of features in Solaris 10 and Liane gave a great presentation on how all the pieces fit together. I can see a lot of promise for ISVs integrating their application software with SMF for higher levels of availability. One question that came to mind is "what sort of applications would be well served by having their own custom delegated restarter?" One possible area I thought of would be telco network applications. These sorts of apps often require special processing and go to great lengths to provide very high levels of reliability. Maybe having a delegated restarter based on the particular types of transactions, these core network apps could provide even higher levels of reliability. My second breakout was another view of server consolidation, this one being a session on lessons learned through an internal project to move internal applications to consoldiated environments using zones. One thing that comes out over and over is that no matter the server consolidation approach being used, planning and operational maturity are the key components to a successful deployment. One interesting comment from this session is that the group doing the deployment felt like they could make better progress and show positive ROI more quickly by approaching things in small, achievable chunks - 20-30 apps at a time, rather than a huge enterprise-wide analysis and deployment. We finished out the morning with a presentation by Scott McNealy, with a pretty good question and answer session. Like Jonathan's, Scott's talk is always a highlight of this event. I believe that the senior executives at Sun really value the contribution, and understand the significance of the contribution, of the technology organizations in the field. All in all, this was a great event. I'm definately heading home with a big list of things to try to work on in my _copious spare time._ There are so many gems hidden down in Solaris that deserve attention so that I can share them with my customers. It's sort of like the guy who works at the hardware store. He has to know from his own experience the basics of what he sells, but he also has to know from study and listening to other customers what all of the other mysterious and arcane items he might have in stock do and how to use them. Now, time to move to meeting two - OS Ambassadors for most of the rest of the week. That's always an exciting meeting. But for both of these, you end up tired! As invigorated as I always am after the meetings, I am also glad to get home!

Sunday Feb 27, 2005

CEC Day 2

Day Two of the CEC is Sunday, 2/27. Like other days, we begin with general sessions, but I missed the early ones. I "cut class" and went to church at Glide Memorial United Methodist Church. Great service and I am glad that I went. More about that later. Finally got to the CEC in time to hear Andy Bechtolsheim and John Fowler's general session about where Sun is going with Opteron servers. David Yen, EVP of Sun's Scalable Systems Group, explained how CMT works and where Sun is going with our upcoming CMT systems. Right after lunch, I had the second round of my BART presentation. Pretty good turnout, I think. Probably about 35 people. BART is one of those little gems in Solaris that people overlook. After my talk, I caught several good talks this afternoon. First one was on the new way that Sun will distribute updates for Solaris 10. This looks to be a real improvement over the current tools and practices. The second talk was on metering and accounting resource usage for utility computing, aka in this case chargeback. The key here is extended accounting and its ability to report usage by task, project, or zone. Exacct is something that I have been intending to look more closely at for a while. Now, I think it's time to do that. Third talk was about the Fault Manager in Solaris 10 given by Mike Shapiro. The more I look at FMA and hear the plans for this, the more impressive this technology is. One more half day tomorrow morning, finishing up with a visit with Scott McNealy. Last year at CEC, Scott (like Jonathan this year) was very open with us. I'm looking forward to that. But, CEC is only the first part of the week for me. Tuesday to Thursday, the OS Ambassadors, a group of roughly 50 Solaris specialists worldwide , will meet or a short mini-conference. We are taking advantage of the fact that most of us are in town for CEC to catch up for a few days. So, looks like a busy week, too.
About

Interesting bits about Solaris, Virtualization, and Ops Center

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today