Thursday Dec 18, 2008

Dutch OpenSolaris mirror

Bart Muijzer reports that the fine folks at NLUUG are now offering the latest OpenSolaris 2008.11 release for download on their site. They rsync daily, so all future OpenSolaris releases will show up there within a day of their release.

Get it here:

Thursday Nov 06, 2008

Support for the Intel 945GME

The fix for the Intel 945GME (6738342) has just been integrated in build 103. This makes X work out of the box on recent mini-notebooks such as my Medion Akoya E1210 (aka MSI Wind 100), the Acer Aspire One and some EEE PC models.

Together with the fix for the rge driver (6717107) that was integrated in build 99 and the fix for the keyboard problem (6695011) that was integrated in build 100, this makes Solaris work almost out-of-the-box on the E1210 (the only thing not yet supported is the wireless interface, but 14 Euros gets you a supported Intel mini PCIe wireless adapter.)

Monday Aug 18, 2008

Running Solaris on the Medion Akoya E1210 - Update

After getting OpenSolaris 2008.11 and Solaris Nevada to run on the Medion Akoya E1210 with some workarounds, I spent some evenings and a rainy Sunday on getting things to work even better.

  • WiFi

    The WiFi chipset in this system is the Ralink RT2790 (pci1814,781) which isn't supported by the ral(7D) driver yet. I filed 6736786 Need support for Ralink RT2790 for this and donated my WiFi card to the folks in Beijing so they have the hardware to test with.

  • Ethernet

    The Realtek 8101E chipset (pci10ec,8136) is not yet supported by the rge(7D) driver in current releases, but it is being worked on (6717107 Need to support Realtek 8102EL and new 8101E variants).

    I built a copy of the driver with the suggested fix listed in the bug, but the driver still failed to work properly on my system with IP hardware checksum offload enabled. It turned out that the chipset is a slightly different version of the 8101E than the 8101E_B listed in CR 6717107. Unlike other all other chipsets supported by the rge driver, these don't support hardware checksum, so the driver must disable hardware checksum offload for these chipsets. The following patch against the current version of rge(7D) makes this particular chipset work without having to disable IP hardware checksum globally in /etc/system.

    The day after updating the bug with this patch, I received an email from one of the engineers in Beijing with an updated version asking if I could run the HCTS network test on my system (since they don't have this particular variant). The new driver passes the HCTS, so this should work out of the box once the fix for 6717107 is integrated.

  • Graphics

    Starting X fails horribly:

    (EE) GARTInit: Unable to open /dev/agpgart (Resource temporarily unavailable)
    (WW) intel(0): /dev/agpgart is either not available, or no memory is available
    for allocation.  Using pre-allocated memory only.
    drmOpenDevice: node name is /dev/dri/card0
    drmOpenDevice: open result is -1, (No such device or address)
    drmOpenDevice: open result is -1, (No such device or address)
    drmOpenDevice: Open failed
    drmOpenDevice: node name is /dev/dri/card0
    drmOpenDevice: open result is -1, (No such device or address)
    drmOpenDevice: open result is -1, (No such device or address)
    drmOpenDevice: Open failed
    [drm] failed to load kernel module "i915"
    (II) intel(0): [drm] drmOpen failed
    (EE) intel(0): [dri] DRIScreenInit failed. Disabling DRI.
    (\*\*) intel(0): Framebuffer compression enabled
    (\*\*) intel(0): Tiling enabled
    (==) intel(0): VideoRam: 7932 KB
    (II) intel(0): Attempting memory allocation with tiled buffers.
    (EE) intel(0): Failed to allocate framebuffer. Is your VideoRAM set too low?
    (II) intel(0): Tiled allocation failed.
    (WW) intel(0): Couldn't allocate tiled memory, fb compression disabled
    (II) intel(0): Attempting memory allocation with untiled buffers.
    (WW) intel(0): Failed to allocate EXA offscreen memory.
    (II) intel(0): Untiled allocation failed.
    (EE) intel(0): Couldn't allocate video memory
    
    Fatal server error:
    AddScreen/ScreenInit failed for driver 0
    

    Using the VESA driver by creating an xorg.conf file worked of course, but that is kind of lame (Compiz anyone?). From the error messages it seemed that agpgart and DRI were not supported for the Intel i945GME.

    Looking through the code and webrevs of the initial agpgart and DRI putbacks and for adding i965 support later on, I was able to cook up the following patch that adds agpgart and DRI support for the i945GME. I have filed 6738342 Need support for Intel i945GME to track this. The patch has been lightly tested on my system only, so use with care.

With Ethernet working, a supported WiFi replacement card in the mail, and Compiz now happily running on this Medion Akoya E1210 mini notebook, I declare victory!

With some DIY (I bought this at a DIY store after all), this EUR 399 mini notebook turns out to be a fine Solaris system.

Saturday Aug 09, 2008

OpenSolaris 2008.11 on the Medion Akoya E1210

With ultra portable notebooks being all the rage and even a Dutch DIY franchise ("Dat zeg ik: GAMMA") selling them, I bought myself a Medion Akoya E1210 mini notebook. This is a rebadged MSI Wind (10" display @ 1024x600, 1.6 GHz Atom N270 CPU, 1 GB memory, 80 GB hard disk, wired and wireless LAN). Rather than using the preinstalled Windows XP, I wanted to run Solaris on this of course. The short version: it almost works.

Since this system does not have a CD to boot from, I created a bootable OpenSolaris USB stick using the usbgen and usbcopy tools from the Distro Constructor project.

$ hg clone ssh://anon@hg.opensolaris.org/hg/caiman/distro_constructor
$ cd distro_constructor/tools
$ su
# ./usbgen /path/to/osol-0811-93.iso /path/to/osol-0811-93.usb /tmp/foo

This will take a couple of minutes. After that copy the generated image to the USB stick:

# ./usbcopy /path/to/osol-0811-93.usb

Insert the USB stick, power on, press F11 and select the USB stick as the boot device

This notebook suffers from the same timing issue as the ASUS Eee PC, so as a workaround change GRUB entry to read:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -v
The system then boots and tries to start X without success (the good news is that the built-in USB webcam is recognized out of the box). The wired ethernet interface (RealTek 8101E) is supported by the rge driver and is plumbed automatically. It doesn't work completely however, pinging another host works but any serious networking like ssh fails silently. Disabling IP hardware checksumming fixes that:

# mdb -kw
> ip`dohwcksum/W 0
dohwcksum:      0               =       0x0
\^D

Now that I have networking up, I can save a copy of the original disk contents (just in case) and see if I can get OpenSolaris installed on the disk using a remote display so I don't have to manually apply the above workarounds each time. More on that later.

Update: I got X to run using this tip: Indiana: VESA if you need it. It's only VESA but it will at least make X usable until I figure out how to get the Intel driver working...

Monday Dec 24, 2007

Marvell Yukon ethernet and xVM

When xVM was integrated into Nevada in build 75, I immediately tried it on my laptop (a Toshiba M3). Only to find out that it didn't work because xVM requires GLD version 3 network drivers. My particular type of M3 unfortunately has a Marvell Yukon gigabit ethernet adapter and the skge driver from SysKonnect I had been using for the past years is not a GLD v3 driver.

While looking for a more recent version of skge (hoping for GLD v3 support in skge), I came across the myk driver written by Masayuki Murayama. This driver can be compiled as a GLD v3 driver and a quick look in the install script showed that the PCI ID of my Yukon chip was supported by myk. To compile a GLD v3 version of the driver you'll need the driver sources and a recent copy of the ON sources (for the required GLD header files).

$ gzcat myk-2.5.0.tar.gz | tar xf -
$ cd myk-2.5.0
$ rm Makefile.config
$ ln -s Makefile.config_gld3 Makefile.config
$ vi Makefile.config
Change -I to point to where you keep the ON sources:
#
# Common configuration infomations for all platforms
#
DRV     = myk
include version
DFLAGS  = -DDEBUG -DDEBUG_LEVEL=0 -DGEM_DEBUG_LEVEL=0 \\
          -DTX_BUF_SIZE=256 -DRX_BUF_SIZE=256 \\
          -I /export/home/ml93401/ws/onnv-gate/usr/src/uts/common  change appropriately
#
CFGFLAGS = -UCONFIG_OO \\
           -DGEM_CONFIG_POLLING \\
           -DGEM_CONFIG_VLAN -UCONFIG_HW_VLAN \\
           -DGEM_CONFIG_CKSUM_OFFLOAD \\
           -DGEM_CONFIG_GLDv3 -DSOLARIS10

LDFLAGS += -dy -N misc/mac -N drv/ip
Build and install the driver:
$ make
$ su
# ./adddrv.sh
After a reboot we should have a GLD v3 driver usable for xVM (if the type is not 'legacy' it is a GLD v3 driver):
# dladm show-link
myk0            type: non-vlan  mtu: 1500       device: myk0
iwi0            type: non-vlan  mtu: 1500       device: iwi0
Creating a DomU now works fine (as expected). If only I had more memory in this laptop...

Sunday Sep 23, 2007

Running recent Nevada builds on a VIA EPIA system

At home I use a VIA EPIA system to serve NFS for my other systems. Since I didn't want another noisy system near my desk, I chose a fanless 600 MHz VIA C3 motherboard. This system has been happily running Nevada (and Solaris 10 before that) for quite some time. As it was running an ancient build (snv_48), I decided to upgrade it to some more recent bits.

While trying to install build 71 some weeks ago, I ran into 6591195 segvn_init() may return before checking HAT_SHARED_REGIONS support where the system paniced with the message "No shared region support on x86 ". Luckily the fix for that went into snv_72, so when snv_73 became available I had another go. This time the system got a little further before it fell over:

init(1M) exited on fatal signal 9: restarting automatically
init(1M) exited on fatal signal 9: restarting automatically
init(1M) exited on fatal signal 9: restarting automatically

Some searching turned up 6572151 snv boot failure since snv_66 which is a dup of 6332924 snv_24 /usr/ccs/bin/as adds new HWCAP tags to previously untagged object. The problem is that libc.so is now built with HWCAP tags that specify that SSE is required while it is not required per se (if SSE is available it will be used, otherwise it will not be used).

However, since the tag says that SSE is required, a system without SSE support will no longer work. The first consumer of libc.so after boot is /sbin/init. It will fail because libc.so requires SSE, die a horrible death and get restarted (and restarted, and restarted, ...). Running isainfo on my EPIA system shows that it indeed does not have the SSE capability:

 $ isainfo -v
32-bit i386 applications
        ahf mmx cx8 tsc fpu 
CR 6332924 is currently in the 'Fix in progress' state so there is no build which includes the fix yet. There is a workaround though: elfedit(1). As the name suggests elfedit can be used to edit ELF files, and that is what I did. By removing the SSE HWCAP tag from libc.so, I was able the get my EPIA to work with build 73.

Here is what I did to make it work: since we obviously can't easily update libc.so on the DVD, I used LiveUpgrade to create a second boot environment for build 73:

# lucreate -c nv_48 -n nv_73 -m /:/dev/dsk/c0d0s3:ufs
# luupgrade -n nv_73 -u -s /mnt   # DVD mounted on /mnt

Next, I mounted the build 73 BE to access the build 73 copy of /lib/libc.so

# lumount -n nv_73

elfedit is available from snv_75 onwards so I copied libc.so to another system running a recent nightly build to remove the SSE tag:

$ file /tmp/libc.so 
/tmp/libc.so:   ELF 32-bit LSB dynamic lib 80386 Version 1 [SSE CX8 FPU], dynami
cally linked, not stripped, no debugging information available
$ elfedit -e 'cap:hw1 -and -cmp sse' /tmp/libc.so
$ file /tmp/libc.so 
/tmp/libc.so:   ELF 32-bit LSB dynamic lib 80386 Version 1 [CX8 FPU], dynamicall
y linked, not stripped, no debugging information available

I copied the modified libc.so back and activated the nv_73 boot environment:

# luumount nv_73
# luactivate nv_73
# init 6

Success!

(Ali Bahrami, I owe you a beer!)

Wednesday Sep 12, 2007

Resource control observability using kstats

One of the things I sometimes miss when using resource controls, is a simple way to see what the current usage of a particular resource control by a project or zone is. While finding out the limit for the rctl is no problem (for that we have prctl(1)), getting the actual usage requires work and implementation knowledge.

For instance, we could get the amount of System V shared memory used by a project using ipcs -Jam and some parsing of its output. Or fire up mdb(1) and lookup the value for kpd_shmmax in the project's kproject_t struct. And, if we wanted to get the usage of another resource control (say the number of lwps), we'd need to use another tool (prstat -LJc) or know that the number of lwps is kept in the kpj_nlwps member. Hardly usable for more than the occasional peek. Plus that relying on kernel implementation details such as these structure members is highly inadvisable as they may change in the future (they probably won't, but they are not stable interfaces so don't rely on them).

The addition of the swap and locked memory resource controls by PSARC 2006/598 Swap resource control; locked memory RM introduced a number of kstats for observability:

  • caps:{zoneid}:swapresv_zone_{zoneid}
  • caps:{zoneid}:lockedmem_zone_{zoneid}
  • caps:{zoneid}:lockedmem_project_{projid}

These kstats have a 'value' statistic for the current limit and a 'usage' statistic that holds the current usage:

$ kstat -c zone_caps -n swapresv_zone_0
module: caps                            instance: 0     
name:   swapresv_zone_0                 class:    zone_caps
        crtime                          0
        snaptime                        102512.50351337
        usage                           532168704
        value                           18446744073709551615
        zonename                        global

Exposing these values as kstats gives us exactly what is needed, a simple, well defined method to get the limit and usage for a resource control.

To satisfy my curiosity and to see what changes would be needed, I spent some evenings creating a prototype that adds kstats for all project.\* and zone.\* resource controls. The following extra kstats are available in the prototype:

  • caps:{projid}:contracts_project_{projid}
  • caps:{projid}:msgids_project_{projid}
  • caps:{zoneid}:msgids_zone_{zoneid}
  • caps:{projid}:nlwps_project_{projid}
  • caps:{zoneid}:nlwps_zone_{zoneid}
  • caps:{projid}:ntasks_project_{projid}
  • caps:{projid}:semids_project_{projid}
  • caps:{zoneid}:semids_zone_{zoneid}
  • caps:{projid}:shmids_project_{projid}
  • caps:{zoneid}:shmids_zone_{zoneid}
  • caps:{projid}:shmmem_project_{projid}
  • caps:{zoneid}:shmmem_zone_{zoneid}

Getting a list of the current usage of all resource controls is now as simple as typing:

$ kstat -p caps:::usage
caps:0:contracts_project_0:usage        33
caps:0:contracts_project_1:usage        2
caps:0:contracts_project_101:usage      0
caps:0:cryptomem_project_0:usage        0
...
caps:5:nlwps_project_0:usage    108
caps:5:nlwps_zone_5:usage       108
caps:5:ntasks_project_0:usage   15
caps:5:semids_project_0:usage   0
caps:5:semids_zone_5:usage      0
caps:5:shmids_project_0:usage   1
caps:5:shmids_zone_5:usage      1
caps:5:shmmem_project_0:usage   172032
caps:5:shmmem_zone_5:usage      172032
caps:5:swapresv_zone_5:usage    95178752

And now that we have the numbers as kstats, we can use any tool to massage the numbers into a form that suits us. The screenshot below is from a hacked up version of one of the JKstat demo programs and shows a graph of the number of LWPs in all projects and zones during boot and shutdown of a Zone.

T:

Wednesday Oct 25, 2006

System V IPC resource controls for Zones

Some weeks ago, I putback my code for 6306668 (RFE: there need to be zone limits for project-based system V resource controls). The fix went into Nevada build 48 which is now available as Solaris Express 10/06 (available here).

Without such zone limits for project-based System V IPC resource controls, a non-global zone administrator could possibly starve other zones by consuming inordinate amounts of System V IPC resources. Particularly in cases where the non-global zone administrator cannot be trusted (either by malice or lack of knowledge and understanding of the impact of his actions) this can be an issue.

The existing zone.\* resource controls have been extended with four new resource controls:

  • zone.max-shm-memory - the total amount of shared memory allowed for a zone, expressed as a number of bytes.
  • zone.max-shm-ids - the maximum number of shared memory IDs allowed for a zone, expressed as an integer.
  • zone.max-sem-ids - the maximum number of semaphore IDs allowed for a zone, expressed as an integer.
  • zone.max-msg-ids - the maximum number of message queue IDs allowed for a zone, expressed as an integer.

These resource controls give the global zone administrator the ability to limit the total consumption of System V IPC resources by processes in a zone. The non-global zone administrator is still able to control the allocation of System V IPC resources inside the zone using the existing project.\* resource controls. So regardless of the limits that a non-global zone administrator sets on projects in the zone, the total amount of IPC resources used by the zone can never exceed the limit set by the global zone administrator.

Setting these resource controls is done in the usual way using zonecfg(1M):

$ zonecfg -z aap
zonecfg:aap> add rctl
zonecfg:aap:rctl> set name=zone.max-shm-memory
zonecfg:aap:rctl> add value (priv=privileged,limit=1073741824,action=deny)
zonecfg:aap:rctl> end
zonecfg:aap> exit

The limit will be in effect after booting the zone. Adding or changing one of these resource controls to a running zone without rebooting can be done using prctl(1M).

One thing to note is that for compatibilty reasons there are no default privileged limits on these resource controls, only a system limit. Having a default privileged limit could break existing configurations because up to now there was no limit at the zone level. Therefore, adding a limit to a running zone requires you to use the -t privileged option to add the privileged limit.

To add a 1 GB limit to a running zone you would use:

prctl -n zone.max-shm-memory -t privileged -v 1073741824 -i zone aap

Once the privileged limit is present, changing the limit to 2 GB would be done like this:

prctl -n zone.max-shm-memory -r -v 2147483648 -i zone aap

T:

Thursday Oct 19, 2006

First Dutch OpenSolaris User Group meeting

The first meeting of the Dutch OpenSolaris User Group will be held next Thursday (October 26th) at the Sun office in Amersfoort. This meeting starts at 19:30 and will feature the following speakers:

  • Bart Muijzer, local NLOSUG host, will introduce the NLOSUG, share some ideas, and try to get discussions going.
  • Casper Dik, resident guru, and OpenSolaris Community Advisory Board (CAB) member, will introduce OpenSolaris and be on hand for questions.
  • Darren Moffat, guest guru, will speak on OpenSolaris development in general and one project in particular: encryption for ZFS.
  • Remco Fugers will introduce the Open Solaris Starter Kit.

More information is here.

See you there!

Saturday Mar 18, 2006

Faster zone provisioning using zoneadm clone

In a recent thread on zones-discuss@opensolaris.org about creating zones in parallel to reduce the time it takes to provision multiple zones, it was suggested that the new zoneadm clone subcommand could be of help. The zoneadm clone subcommand (available from build 33 onwards) copies an installed and configured zone. Cloning a zone is faster than installing a zone, but how much faster? To find out I did some quick experiments creating and cloning both whole root and sparse root zones on a V480:

Creating a whole root zone:

# zonecfg -z zone1
zone1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zone1> create -b
zonecfg:zone1> set zonepath=/zones/zone1
zonecfg:zone1> exit
# time zoneadm -z zone1 install
time zoneadm -z zone1 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <123834> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <986> packages on the zone.
Initialized <986> packages on zone.
Zone  is initialized.
Installation of these packages generated errors: 
The file  contains a log of the zone installation.

real    13m40.647s
user    2m49.840s
sys     4m43.221s

Cloning a whole root zone:

# zonecfg -z zone1 export|sed -e 's/zone1/zone2/'|zonecfg -z zone2
zone2: No such zone configured
Use 'create' to begin configuring a new zone.
# time zoneadm -z zone2 clone zone1
Cloning zonepath /zones/zone1...

real    8m4.615s
user    0m9.780s
sys     2m18.334s

For the whole root zone cloning is almost twice a fast as a regular install.

Creating a sparse root zone:

# zonecfg -z zone3
zone3: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zone3> create
zonecfg:zone3> set zonepath=/zones/zone3
zonecfg:zone3> exit
# time zoneadm -z zone3 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <2535> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <986> packages on the zone.
Initialized <986> packages on zone.
Zone  is initialized.
Installation of these packages generated errors: 
The file  contains a log of the zone installation.

real    6m3.227s
user    1m45.902s
sys     2m47.717s

Cloning a sparse root zone:

# zonecfg -z zone3 export|sed -e 's/zone3/zone4/'|zonecfg -z zone4
zone4: No such zone configured
Use 'create' to begin configuring a new zone.
# time zoneadm -z zone4 clone zone3
Cloning zonepath /zones/zone3...

real    0m11.535s
user    0m0.706s
sys     0m6.440s

For the sparse root zone, cloning is more than thirty times faster then installing!

So if you need to provision multiple zones of a certain configuration, zoneadm clone is clearly the way to go.

Note that the current clone operation does not (yet) take advantage of ZFS. To see what ZFS can do for zone cloning, have a look at Mike Gerdts' blog: Zone created in 0.922 seconds. Goodness indeed.

T:

Wednesday May 25, 2005

Monitoring zone boot and shutdown using DTrace

Several people have expressed a desire for a way to monitor zone state transitions such as zone boot or shutdown events. Currently there is no way to get notified when a zone is booted or shutdown. One way would be to run zoneadm list -p at regular intervals and parse the output, but this has some drawbacks that make this solution less ideal:

  • it is inefficient because you are polling for events,
  • you will probably start at least two processes for each polling cycle (zoneadm(1M) and nawk(1)),
  • more importantly, you could miss transitions if your polling interval is too large. Since a zone reboot might take only seconds, you would need to poll often in order not to miss a state change.

A better, much more efficient solution can be built using DTrace, the 'Swiss Army knife of system observability'. As mentioned in this message on the DTrace forum, the zone_boot() function looks like a promising way to get notifications when a zone is booted. Listing all FBT probes with the string 'zone_' in their name (dtrace -l fbt|grep zone_) turns up another interesting function: zone_shutdown(). To verify that these probes are fired when a zone is either booted or shutdown, let's enable both probes:

# dtrace -n 'fbt:genunix:zone_boot:entry, fbt:genunix:zone_shutdown:entry {}'
dtrace: description 'fbt:genunix:zone_boot:entry, fbt:genunix:zone_shutdown:entry ' matched 2 probes

When zoneadm -z zone1 boot is executed we see that the zone_boot:entry probe fires:

CPU     ID                    FUNCTION:NAME
  0   6722                  zone_boot:entry

The zone_shutdown:entry probe fires when the zone is shutdown (either by zoneadm -z zone1 halt or using init 0 from within the zone):

  0   6726              zone_shutdown:entry

This gives us the basic 'plumbing' for the monitoring script. By instrumenting the zone_boot() and zone_shutdown() functions with the FBT provider we can wait for zone boot and shutdown with almost zero overhead. Now what is left is finding out the name of the zone that was booted or shutdown. This requires some knowledge of the implementation and access to the source (anyone interested can take a look at the source after OpenSolaris is launched, so stay tuned).

A quick look at the source shows that we can get the zone name by instrumenting a third function, zone_find_all_by_id() that is called by both zone_boot() and zone_shutdown(). This function returns a pointer to a zone_t structure (defined in /usr/include/sys/zone.h). The DTrace script below uses a common DTrace idiom: in the :entry probe we set a thread-local variable trace that is used as a predicate in the :return probes (the :return probes have the information we're after). The FBT provider :return probe stores the function return value in args[1] so we can access the zone name as args[1]->zone_name in fbt:genunix:zonefind_all_by_id:return and save it for later use in fbt:genunix:zone_boot:return and fbt:genunix:zone_shutdown:return.

#!/usr/sbin/dtrace -qs

self string name;

fbt:genunix:zone_boot:entry
{
        self->trace = 1;
}

fbt:genunix:zone_boot:return
/self->trace && args[1] == 0/
{
        printf("Zone %s booted\\n", self->name);
        self->trace = 0;
        self->name = 0;
}

fbt:genunix:zone_shutdown:entry
{
        self->trace = 1;
}

fbt:genunix:zone_shutdown:return
/self->trace && args[1] == 0/
{
        printf("Zone %s shutdown\\n", self->name);
        self->trace = 0;
        self->name = 0;
}

fbt:genunix:zone_find_all_by_id:return
/self->trace/
{
        self->name = stringof(args[1]->zone_name);
}

Starting the script and booting and shutting down some Zones gives the following result:

# ./zonemon.d
Zone aap booted
Zone noot booted
Zone noot shutdown
Zone noot booted
Zone aap shutdown

So there you have it, a simple DTrace script that will efficiently wait for zone boot and shutdown events. Enjoy.

Technorati Tag: Solaris

Technorati Tag: DTrace

Thursday Mar 24, 2005

Which Polyhedral Are You?

I am a d10

Take the quiz at dicepool.com

So true.

Thursday Mar 10, 2005

Dynamic Zone changes

This subject has come up several times in the last two weeks, so it might be a good opportunity to finally start using my blog.

When talking to a colleague about Zones he said: 'I have been looking at Zones and while they are cool, they are also "static". To add an extra file system to a running zone I have to restart the zone.'. Well, as it happens, this is not required. You can dynamically add a file system to a running zone. Here's how:

The current configuration of the running zone looks like this:

# zonecfg -z zone1 info
zonepath: /export/zones/zone1
autoboot: true
pool: large
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address: 129.159.206.38/26
        physical: hme0
rctl:
        name: zone.cpu-shares
        value: (priv=privileged,limit=10,action=none)

Adding a new UFS file system to this zone would entail the following: create a new file system in the global zone, add an fs resource to the zone configuration and restart the zone to re-read the configuration.

global # newfs /dev/md/rdsk/d100
newfs: construct a new file system /dev/md/rdsk/d100: (y/n)? y
Warning: 1280 sector(s) in last cylinder unallocated
/dev/md/rdsk/d100:      1024000 sectors in 712 cylinders of 15 tracks, 96 sectors
        500.0MB in 45 cyl groups (16 c/g, 11.25MB/g, 5440 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 23168, 46304, 69440, 92576, 115712, 138848, 161984, 185120, 208256,
 806720, 829856, 852992, 876128, 899264, 922400, 945536, 968672, 991808,
 1014944,
global # zonecfg -z zone1
zonecfg:zone1> add fs
zonecfg:zone1:fs> set dir=/u01
zonecfg:zone1:fs> set special=/dev/md/dsk/d100
zonecfg:zone1:fs> set raw=/dev/md/rdsk/d100
zonecfg:zone1:fs> set type=ufs
zonecfg:zone1:fs> end
zonecfg:zone1> exit

At this point we could reboot the zone and have the new file system mounted during zone boot. However, there is no need to restart the zone because the file system can be mounted into the running zone from the global zone. The only thing we have to do now is add the mountpoint in the zone ourselves:

global # mkdir /export/zones/zone1/root/u01
global # mount /dev/md/dsk/d100 /export/zones/zone1/root/u01

Note that there is an extra /root/ component in the path to the file system. Inside the zone we see that the new file system has appeared:

zone1 # df -h
Filesystem             size   used  avail capacity  Mounted on
/                       15G   3.3G    11G    23%    /
/dev                    15G   3.3G    11G    23%    /dev
/lib                    15G   3.3G    11G    23%    /lib
/platform               15G   3.3G    11G    23%    /platform
/sbin                   15G   3.3G    11G    23%    /sbin
/usr                    15G   3.3G    11G    23%    /usr
proc                     0K     0K     0K     0%    /proc
ctfs                     0K     0K     0K     0%    /system/contract
swap                   8.3G   264K   8.3G     1%    /etc/svc/volatile
mnttab                   0K     0K     0K     0%    /etc/mnttab
fd                       0K     0K     0K     0%    /dev/fd
swap                   8.3G     0K   8.3G     0%    /tmp
swap                   8.3G    32K   8.3G     1%    /var/run
/u01                   469M   1.0M   421M     1%    /u01

But wait, there's more. The same 'magic' can be applied to add an extra network interface to a running zone. Instead of adding a net resource to the zone configuration and then rebooting the zone, we add the net resource to the zone configuration (to make the change persistent) and then use ifconfig(1M) from the global zone to add the network interface dynamically.

global # zonecfg -z zone1
zonecfg:zone1> add net
zonecfg:zone1:net> set physical=hme0
zonecfg:zone1:net> set address=192.168.1.13/24
zonecfg:zone1:net> end
zonecfg:zone1> exit
global # ifconfig hme0 addif 192.168.1.13 netmask + broadcast + zone zone1 up
Created new logical interface hme0:3
Setting netmask of hme0:3 to 255.255.255.0

The key point here is the 'zone' option of ifconfig. Running ifconfig -a inside the zone shows that we now have the extra network interface. And without having to reboot the zone!

zone1 # ifconfig -a
lo0:5: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
hme0:2: flags=1000843 mtu 1500 index 2
        inet 129.159.206.38 netmask ffffffc0 broadcast 129.159.206.63
hme0:3: flags=1000843 mtu 1500 index 2
        inet 192.168.1.13 netmask ffffff00 broadcast 192.168.1.255

There are more things that can be changed dynamically such as resource controls and pool binding. I'll leave that for another blog entry.

So: Zones are cool and dynamic!

About

menno

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder