Wednesday Jan 20, 2010

The EU clears Oracle's acquisition of Sun Microsystems

The EU press release is here.

Tuesday Jan 19, 2010

Solaris 10 and OpenSolaris on the same zfs root pool

Sharing more info.. Here are some hints for installing both Solaris 10 and OpenSolaris on the same zfs rpool.

I did this on a VirtualBox guest.. I did a fresh install of s10u8 on a zfs root (you need to use the text installer or do a net based install to install on a zfs root).

I then booted an OpenSolaris iso, and after setting my hostname and hostid, ran the create-be script to create a new OpenSolaris BE on the S10's zfs root.

DISCLAIMER: this is totally unsupported by Sun, could mess up your system, etc. etc.

Write down your hostname, hostid, IP addr, netmask, gateway, and NIS domain.

	bash-3.00# uname -a
	SunOS unknown 5.10 Generic_141445-09 i86pc i386 i86pc
	bash-3.00# hostid
	10fa4034
	bash-3.00# echo "hw_serial,0xa?B" | mdb -k
	hw_serial:
	hw_serial:32 38 34 38 33 35 38 39 32 0
	bash-3.00#
Boot your OpenSolaris iso.. Set your hostname and hostid.
        jack@opensolaris:~$ pfexec su -
	root@opensolaris:~# hostname unknown
	root@unknown:~# hostid
	00041f55
	root@unknown:~# echo "hw_serial/v 32 38 34 38 33 35 38 39 32 0" | mdb -kw
	root@unknown:~# hostid
	10fa4034
	root@unknown:~# 
Get the create-be script...
        root@opensolaris:~# wget http://blogs.sun.com/mrj/resource/create-be
        root@opensolaris:~# chmod a+x create-be
Import your Solaris 10 rpool... Create the OpenSolaris BE (we'll need to manually create the menu.lst entry later since the script doesn't handle s10 menu.lst entries right now.
        root@opensolaris:~# zpool import rpool
        root@opensolaris:~# /root/create-be --build=129 --bename=osol129 
Add your new BE into your menu.lst. e.g. here is my entry.
  title osol129
  findroot (pool_rpool,0,a)
  bootfs rpool/ROOT/snv129
  kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
  module$ /platform/i86pc/$ISADIR/boot_archive
Install a newer version of grub
        root@opensolaris:~# /mnt/sbin/installgrub /mnt/boot/grub/stage1 /mnt/boot/grub/stage2 /dev/rdsk/c0d0s0
Now, reboot into your OpenSolaris BE, configure it, and look around. You can reboot back into s10 at any time...
root@unknown:~# uname -a
SunOS unknown 5.11 snv_129 i86pc i386 i86pc Solaris
root@unknown:~# 

root@unknown:~# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
rpool                      10.4G  28.8G    40K  /rpool
rpool/ROOT                 8.37G  28.8G    21K  legacy
rpool/ROOT/s10x_u8wos_08a  3.67G  28.8G  3.67G  /
rpool/ROOT/snv129          4.70G  28.8G  4.70G  /mnt
rpool/dump                 1.00G  28.8G  1.00G  -
rpool/export                 44K  28.8G    23K  /export
rpool/export/home            21K  28.8G    21K  /export/home
rpool/swap                    1G  29.8G    16K  -
root@unknown:~# 

root@unknown:~# beadm list
BE             Active Mountpoint Space Policy Created          
--             ------ ---------- ----- ------ -------          
s10x_u8wos_08a R      -          3.67G static 2010-01-13 12:02 
snv129         N      /          4.70G static 2010-01-14 14:24 
root@unknown:~# beadm activate s10x_u8wos_08a
root@unknown:~# reboot

...

bash-3.00# uname -a
SunOS unknown 5.10 Generic_141445-09 i86pc i386 i86pc
bash-3.00# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
rpool                      10.4G  28.8G    40K  /rpool
rpool/ROOT                 8.37G  28.8G    21K  legacy
rpool/ROOT/s10x_u8wos_08a  3.67G  28.8G  3.67G  /
rpool/ROOT/snv129          4.70G  28.8G  4.70G  /mnt
rpool/dump                 1.00G  28.8G  1.00G  -
rpool/export                 44K  28.8G    23K  /export
rpool/export/home            21K  28.8G    21K  /export/home
rpool/swap                    1G  29.8G    16K  -
bash-3.00#

Thursday Dec 17, 2009

Switching from Nevada to OpenSolaris

I've fixed another bug (findroot grub menu entry) in the create-be script and added support for additional zpools (i.e. along with rpool, other zpools should now work seamlessly across Nevada and OpenSolaris BEs. NOTE: for additional zpools, the nevada and OpenSolaris BEs should be the exact same build).

The scripts lets you create a new (non COW) OpenSolaris BE on a nevada zfs root based system (or OpenSolaris system). You can use this to transition a Nevada zfs root based system to OpenSolaris. You can also choose to install an arbitrary OpenSolaris build (i.e. if you want to downgrade).

DISCLAIMER: this is totally unsupported by Sun, could mess up your system, etc. etc.

I strongly recommend creating a scratch BE to run this script out of, in case something goes wrong.

You can grab an updated copy here. Remember the DISCLAIMER above.. Back up your data first!

For folks inside of SWAN, here's a cheatsheet...

Write down your IP addr, netmask, gateway, NIS domain

Create & switch to a scratch BE

# lucreate -n scratch-be
# luactivate scratch-be
# init 6
Install the new OpenSolaris BE (this is not an upgrade, it's a fresh install into a BE on the same rpool as your current BEs)
# pkgadd -d /net/girltalk2/export/mrj/pkg-gate/packages/i386/ \\
  SUNWipkg SUNWpython-ply SUNWpython-pycurl
# wget http://blogs.sun.com/mrj/resource/create-be
# chmod a+x create-be
# /root/create-be --build=129 --bename=osol129 --repo=http://ipkg.sfbay/dev --menu="osol129"
If you want to install additional software, i.e. setup a build machine
# zfs set mountpoint="/mnt" rpool/ROOT/osol129
# zfs mount rpool/ROOT/osol129
# pkg -R /mnt set-publisher -O http://ipkg.sfbay/extra extra
# pkg -R /mnt install developer/opensolaris/osnet@0.5.11-0.129
# zfs umount rpool/ROOT/osol129
If you want to be able to build xvm-gate
# zfs set mountpoint="/mnt" rpool/ROOT/osol129
# zfs mount rpool/ROOT/osol129
# pkg -R /mnt install \\
  SUNWgmake@3.81-0.129 \\
  SUNWbcc@0.16.17-0.129 \\
  SUNWgnu-readline@5.2-0.129 \\
  SUNWxwinc@0.5.11-0.129 \\
  SUNWgnome-common-devel@0.5.11-0.129 \\
  SUNWlibtool@1.5.22-0.129 \\
  SUNWgnu-automake-110@1.10-0.129 \\
  SUNWaconf@2.63-0.129 \\
  SUNWgit@1.5.6.5-0.129 \\
  SUNWxvm@3.3.2-0.129
# zfs umount rpool/ROOT/osol129
Reboot into your new be, configure the network, etc. migrate over other BE settings, e.g. sshd config.
# bootadm list-menu
# bootadm set-menu default=....your-osol129-menu-number....
# reboot
: run through sysconfig, reboot
: login
# beadm mount ....your-old-be.... /mnt
# cp /mnt/etc/ssh/sshd_config /etc/ssh/
# cp /mnt/etc/ssh/\*key\* /etc/ssh/
# svcadm refresh ssh;svcadm restart ssh
: migrate over other customizations you might have
# beadm umount ....your-old-be....

Friday Nov 20, 2009

Install Nevada and OpenSolaris on the same zfs rpool

I wrote a relatively simple python script, here, which will allow you to install OpenSolaris on an existing Nevada, zfs root based x86 system. They will use different root filesystems (e.g. rpool/ROOT/\*) of course, but will share the other filesystems (e.g. /export).

DISCLAIMER: this is totally unsupported by Sun, could mess up your system, etc. etc. It works great for me.. But may not for you. :-)

You can also use this script on an OpenSolaris system to install a new Boot Environment (BE) from scratch. When you do a beadm create, OpenSolaris will create a copy-on-write (COW) clone of the current filesystem (which is what you want normally). There are cases where you may not want to clone your existing config. e.g. If you really messed up your system configuration, you can re-install into a new BE (allows you to keep the old config around) instead of re-installing from scratch. Or you can use this to install an older build in a new BE. e.g. say you need to test something in an older build and you didn't keep any of your older BEs around. Here's an example invocation...

create-be --build=121 --bename=osol-121 --repo=http://pkg.opensolaris.org/dev

To install OpenSolaris on a Nevada system, you first need to install the OpenSolaris packaging system on your Nevada system. I'm not going to go over how to build the ipkg gate, but this should get you going in the right direction if you can't find binary packages.

hg clone ssh://anon@hg.opensolaris.org/hg/pkg/gate pkg-gate
\*\* build everything
cd packages
pkgadd -d . SUNWipkg SUNWpython-ply SUNWpython-pycurl
You need to be using a zfs root of course.. i.e.
root@pico:~# df /
Filesystem           1K-blocks      Used Available Use% Mounted on
rpool/ROOT/snv_127 ...
Now use the script to create a new OpenSolaris BE on your nevada rpool, giving it a BE name, and tell it to create a grub menu entry. This will take a while :-).. You can override the default build (--build=) or point to a different repo (--repo=)
create-be --bename=osol-127 --menu="OpenSolaris b127"
This script does a fresh install... So you will have to reconfigure the system when you reboot into the new BE (the current BE is unaffected). The only settings I presently migrate over is the grub menu entry and the console settings in bootenv.rc.

Once the install completes (repeats the "takes a while"), you can now reboot into the OpenSolaris BE.. A note here.. you should update the grub menu so the default menu entry is the new entry which was added. This is because you will reboot into the new BE, configure it, and it will reboot again after it completes configuring the OpenSolaris BE. fastreboot will then boot into the original BE if you don't change the default grub menu entry. Also, make sure you do a reboot -p!!!!

-bash-4.0# bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 10
0 Solaris Express Community Edition snv_127 X86
1 Solaris failsafe
2 OpenSolaris b127
-bash-4.0# bootadm set-menu default=2

Now that you have booted and configured your OpenSolaris BE, lets poke around... run beadm list to make sure you actually booted into the right BE first :-). NOTE: OpenSolaris thinks the Nevada BE is an opensolaris BE which works out nice... I like to copy over my ssh keys since I use the same IP address for the nevada BE and OpenSolaris BE. I'll also create a new OpenSolaris BE using beadm..

root@pico:/etc/ssh# beadm list
BE       Active Mountpoint Space Policy Created          
--       ------ ---------- ----- ------ -------          
osol-127 NR     /          3.79G static 2009-11-20 20:03 
snv_127  -      -          7.97G static 2009-11-19 10:48
root@pico:/etc/ssh# beadm mount snv_127 /mnt
root@pico:/etc/ssh# cp /mnt/etc/ssh/\*key\* /etc/ssh/
root@pico:/etc/ssh# cp /mnt/etc/ssh/sshd_config /etc/ssh/
root@pico:/etc/ssh# beadm umount snv_127
root@pico:/etc/ssh# svcadm refresh ssh
root@pico:/etc/ssh# svcadm restart ssh
root@pico:/etc/ssh# beadm create test-osol-be
root@pico:/etc/ssh# beadm list
BE           Active Mountpoint Space Policy Created          
--           ------ ---------- ----- ------ -------          
osol-127     NR     /          3.79G static 2009-11-20 20:03 
snv_127      -      -          7.97G static 2009-11-19 10:48 
test-osol-be -      -          51.0K static 2009-11-21 06:54 
Finally, lets reboot back into nevada and look around (I love fast reboot).
root@pico:/etc/ssh# beadm activate snv_127
root@pico:/etc/ssh# reboot
Nov 21 06:57:26 pico reboot: initiated by root on /dev/console
syncing file systems... done
rebooting...
SunOS Release 5.11 Version snv_127 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: pico
Reading ZFS config: done.
Mounting ZFS filesystems: (7/7)

pico console login: root
Password: 
Nov 21 06:59:06 pico login: ROOT LOGIN /dev/console
Last login: Sat Nov 21 05:10:55 on console
Sun Microsystems Inc.   SunOS 5.11      snv_127 November 2008
-bash-4.0# beadm list
-bash: beadm: command not found
-bash-4.0# lustatus
ERROR: No boot environments are configured on this system
ERROR: cannot determine list of all boot environment names
-bash-4.0# zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
rpool                    13.2G   132G  35.5K  /rpool
rpool/ROOT               11.8G   132G    21K  legacy
rpool/ROOT/osol-127      3.80G   132G  3.79G  /
rpool/ROOT/snv_127       7.97G   132G  7.97G  /
rpool/ROOT/test-osol-be    51K   132G  3.79G  /
rpool/dump                959M   132G   959M  -
rpool/export               44K   132G    23K  /export
rpool/export/home          21K   132G    21K  /export/home
rpool/swap                512M   133G  9.05M  -
-bash-4.0# df /
/                  (rpool/ROOT/snv_127):277843413 blocks 277843413 files
-bash-4.0# 

Tuesday Sep 15, 2009

Installing a small xVM or VirtualBox OpenSolaris guest

I have a little python script that I use for for personal use to do some small OpenSolaris guest installs. I use it for VirtualBox and xVM guests, but it should work fine for metal too assuming you add the correct drivers needed for your system.

It's a simple text based installer. Now, I'm not much of a Python coder, the script is something I play around with in my spare time, it's not finished, and it's not supported by Sun, etc, etc.. :-) But I though some folks would find it useful so I'm sharing it

A 2009.06 based xVM install is something around 380M vs 3G+. b122 is quite a bit bigger due to some dependency bloat. For a xVM guest, boot the 2009.06 iso, login as jack, grab the installer and run it...

: core2[1]#; virt-install -n opensolaris -r 1024 -p --nographics -l /net/192.168.0.71/tank/isos/solaris/os2009.06.iso -f /vdisks/opensolaris 
[CUT]
opensolaris console login: jack
Password: 
Last login: Tue Sep 15 05:19:57 from core2.lan
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
jack@opensolaris:~$ wget http://blogs.sun.com/mrj/resource/slim-guest-installer
[CUT]
05:28:04 (69.94 KB/s) - `slim-guest-installer' saved [16096/16096]
jack@opensolaris:~$ chmod a+x slim-guest-installer 
jack@opensolaris:~$ pfexec ./slim-guest-installer

Thanks for choosing to install the OpenSolaris OS! Before you start, review
the Release Notes for this release for a list of known problems. The release
notes can be found at
   http://opensolaris.org/os/project/indiana/resources/relnotes/200906/x86

\*\*\*\*
NOTICE: THIS INSTALLER ONLY SUPPORTS INSTALLING TO A WHOLE DISK. ALL DATA
ON THE DISK YOU INSTALL TO WILL BE DESTROYED.
\*\*\*\*

Please Select Install Disk

  AVAILABLE DISK SELECTIONS:
	0.  /dev/dsk/c7t0d0p0  21459755520 bytes
Specify disk (enter its number or 'q' to quit): 0

NOTE: ALL DATA ON THIS DISK WILL BE DESTROYED.

Install on /dev/rdsk/c7t0d0p0 (yes or no): yes
Configuring ZFS Root:................ COMPLETE
Installing packages...
DOWNLOAD                                    PKGS       FILES     XFER (MB)
SUNWopenssl                                39/67   6042/8542   57.27/93.32
[CUT]

For VirtualBox, boot the 2009.06 iso, login as jack, grab the installer and run it with an additional option (--profile=vbox-guest) to specify virtualBox packages.

opensolaris console login: jack
Password: 
Last login: Tue Sep 15 05:19:57 from core2.lan
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
jack@opensolaris:~$ wget http://blogs.sun.com/mrj/resource/slim-guest-installer
[CUT]
05:28:04 (69.94 KB/s) - `slim-guest-installer' saved [16096/16096]
jack@opensolaris:~$ chmod a+x slim-guest-installer 
jack@opensolaris:~$ pfexec ./slim-guest-installer --profile=vbox-guest

Have Fun!

Friday Apr 10, 2009

How small can Solaris go (part 2)

First, a couple of answers to some questions from the first post..

James, re: "be able (after a PXE boot) to mount NFS or iSCSI and switch (or layer) root". No. I'm really going for a standalone, extremely stripped down, ramdisk based image right now. No swap. No disk support. This wouldn't be something you would use for a NAS appliance. For that, you would be better off going with a stripped down pkg based opensolaris image. I can get those down to ~350M using a "custom installer"

Benoit, re: "And in a memory usage how light can solaris go?" Good question, lets take a look :-)

When I do get a little free time to play around with this stuff, I generally do it using OpenSolaris xVM domUs since it takes about a second to test then reboot these images.

The last post I did was a i86pc based image.. Here's a domU with the same bits loaded.. We're at ~ 15M for disk usage.

Here's the py file for my domU.. Notice I'm using the dom0's unix (since I'm building the ramdisk based on dom0's bits).

: alpha[1]#; cat /tank/guests/micro/guest.py 
name = "micro"
vcpus = 1
memory = "256"
kernel = "/platform/i86xpv/kernel/unix"
ramdisk = "/tank/guests/micro/ramdisk"
extra = "/platform/i86xpv/kernel/unix"
vif = ['']
on_shutdown = "destroy"
on_reboot = "restart"
on_crash = "preserve"
: alpha[1]#; 

: alpha[1]#; xm create -c /tank/guests/micro/guest.py
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.3.2-rc1-pre-xvm chgset 'Mon Apr 06 20:13:29 2009 -0400 18424:97250633e58b'
SunOS Release 5.11 Version onnv-3.3-mrj 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   15116   19314    44%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  205012       0  205012     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
#
Now, we are really stripped down (for Solaris)... Lets add in kmdb and enough of mdb to let us do a mdb -K.

I have a very very ugly python script I use to build up my ramdisk...

: alpha[1]#; ./micro.py 
USAGE: ./micro.py cfg disk sizeM
Here's the config file I'm using for my domu after uncommenting out kmdb... Notice I don't even have syscalls in at this point...
: alpha[1]#; cat domu.files
@kmdb32

@kernel32
@i86xpv32
@init32
#@syscall32

#@mount
#@uidcache

#@net32
#/usr/bin/ln

#@devfsadm32
#@basic32
#@ssh32
I just luupgraded my system to b112.. Lets build a new image using that... So how much memory are we using?
: alpha[1]#; ./micro.py domu.files /tank/guests/micro/ramdisk 40
NOTICE: overwriting disk: /tank/guests/micro/ramdisk

: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   19246   15184    56%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  204992       0  204992     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                       8922                34   14%
Anon                          140                 0    0%
Exec and libs                  78                 0    0%
Page cache                    550                 2    1%
Free (cachelist)              708                 2    1%
Free (freelist)             53089               207   84%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 
less than 40M... Not great, but not bad either... There some code changes we could do to get things smaller. But it's not big enough where it would matter at this point... Since I'm building a 40M ramdisk, we need space for that too.. So I would need around 80-90M total for this image..

Lets do a quick test, changing the domU's memory to 80M.

< memory = "256"

> memory = "80"

: alpha[1]#; ./micro.py domu.files /tank/guests/micro/ramdisk 40
NOTICE: overwriting disk: /tank/guests/micro/ramdisk
: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
strplumb: failed to initialize drv/dld
# mdb -K
WARNING: retrying of kmdb allocation of 0x600000 bytes

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                       5766                22   31%
Anon                          140                 0    1%
Exec and libs                   0                 0    0%
Page cache                      1                 0    0%
Free (cachelist)             1259                 4    7%
Free (freelist)             11265                44   61%
Balloon                         0                 0    0%

Total                       18431                71
[0]> 
Interesting.. What about a 64-bit kernel? It's going to bigger of course. But if you need a 64-bit kernel, memory shouldn't be an issue.
: alpha[1]#; xm create -c /tank/guests/micro/guest64.py 
Using config file "/tank/guests/micro/guest64.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
strplumb: failed to initialize drv/dld
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk:a             38255   31187    3243    91%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  195480       0  195480     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      15575                60   25%
Anon                          192                 0    0%
Exec and libs                  93                 0    0%
Page cache                    771                 3    1%
Free (cachelist)             1203                 4    2%
Free (freelist)             45653               178   72%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 
OK, now lets do something interesting... Let bring up networking enough so we can ping, etc. In my domu.files file, I'm going to bring in @syscall32, @mount, @net32, and /usr/bin/ln.

One thing you notice when you start playing with this stuff, is that things can grow very fast when you start pulling in user bins (due to all the libraries which can be pulled in too). You would think things like reboot and poweroff would be a small impact. Not so :-)

I have a custom init bin which configures the system and starts up a shell (no SMF, etc). I'll manually configure it though so you can see what I'm doing to get the system up.

Notice I'm setting up a dev link for the NIC.. We don't have devfsadm in this particular ramdisk. Obviously you would pre-create the link on the ramdisk, or pull in the devfsadm bits.. But I though it was an interesting thing to show.

: alpha[1]#; xm create -c /tank/guests/micro/guest.py 
Using config file "/tank/guests/micro/guest.py".
Started domain micro
v3.1.4-xvm chgset 'Mon Mar 30 23:29:09 2009 -0700 15914:bb9557896640'
SunOS Release 5.11 Version snv_112 32-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: Invalid iBFT table 0x1
# mount -o remount,rw /devices/ramdisk:a /
# /sbin/soconfig -f /etc/sock2path
# ifconfig lo0 plumb 127.0.0.1 netmask 255.255.255.0 up
# cd /dev
# ln -s ../devices/xpvd/xnf@0:xnf0 xnf0
# ifconfig xnf0 plumb 192.168.0.91 netmask 255.255.255.0 up
# route add default 192.168.0.1
add net default: gateway 192.168.0.1
# ping 192.168.0.1
192.168.0.1 is alive
# df -lk
Filesystem            kbytes    used   avail capacity  Mounted on
/devices/ramdisk:a     38255   28442    5988    83%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  198852       0  198852     0%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
# mdb -K

Welcome to kmdb
kmdb: no terminal data available for TERM=
kmdb: failed to set terminal type to `', using `vt100'
Loaded modules: [ scsi_vhci mac xpv_psm ufs unix krtld genunix specfs xpv_uppc
 ]
[0]> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      11628                45   18%
Anon                          140                 0    0%
Exec and libs                 129                 0    0%
Page cache                    711                 2    1%
Free (cachelist)             1435                 5    2%
Free (freelist)             49444               193   78%
Balloon                         0                 0    0%

Total                       63487               247
[0]> 

What's next? I'm trying to get VirtualBox (CLI only) running (RDP for external access) :-)

Monday Mar 30, 2009

Intel Xeon 5500 series

Congratulations to Intel on their new Intel Xeon 5500 series. It is a truly remarkable CPU, and in my opinion, will be looked back on as a game changer in the industry.

There are features which are easy to see, such as the integrated memory controllers and the Quick Path Interconnects (QPI) (which connect the CPUs and IO bridges to each other).

But, as you would expect, they also continue to improve the guts of the CPUs. For example, with the 5500 series and associated IO chipsets, comes virtualization improvements including better virtualized CPU performance, and the ability to safely passthrough an IO device to a virtualized guest.

With Intel's help, we backported a lot of the new functionality from xen-unstable.hg into OpenSolaris xVM in April of 2008, including the big ones, Extended Page Table (EPT) support and Virtual Processor IDs (VPID). So we've been ready for a while :-)

With the EPT support, you can bypass the shadow page code in the hypervisor for fully virtualized guests. This gives you a nice performance improvement, and has the added benefit of a not having to run a \*lot\* of complex, and occasionally, buggy code.

I've just scratched the surface of the virtualization improvements in the Intel Xeon 5500 series. For folks who enjoy CPU technology, we live in exciting times.

Monday Jan 22, 2007

What's an iPhone without the phone?

What's an iPhone without the phone? My guess, the nextgen iPod with ichat w/ audio, or a skype like app? Seems like the logical progression to me...

Monday Jul 17, 2006

Bringing up Solaris domain 0 on Xen

Bringing up Solaris domain 0 (dom0) on Xen was surprisingly easy. Mostly because all of the hard work was already done by other people. The hard work which remained, was also done by other people :-)

I apologize in advance for giving credit to the wrong folks or for taking credit for something I didn't do. This was such a blur, it all tends to blend together...

Obviously, this won't cover everything. I tried to talk about some of the more interesting parts. Well, interesting is relative of course :-)

To start with, first you need to be able build xen on Solaris. You could actually cheat and start with a xen image and skip all the user apps to manage domUs. But that seems kind of pointless unless you have tons of bodies to throw at the effort, which we don't, thankfully.

John L and Dave already had Xen building, so all I had to do was ask them what I needed to do to build it.. The first thing you need are changes to gcc and binutils that's shipped in /usr/sfw. Which is why you need to download unofficially updated SUNWgcc, SUNWgccruntime, and SUNWbinutils packages in order to build the xen sources on Solaris (they will be officially updated at some point in the future).

There were two things that John L fixed. The first one was a bug in how we build gcc (can't find it's own ld scripts). See this bug.

The second fix was to add a -divide to the binutils gas to not treat / as a comment. John got this change back to to binutil cvs repository, but it hasn't made it out in a release yet (as far as I know).

Of course, Dave and John L had to change stuff in the xen.hg gate to get it to compile too. If you look at the source, you'll notice there are a few things we don't try and compile current, e.g. hvm related support. Then, of course, you need to test it to make sure the xen binary worked (user apps would have to wait until Solaris dom0 was up). Not sure if it just worked or they had to debug it, but it was working by the time I got to it :-)

So after I built my xen gate, put xen.gz in /boot (starting with 32-bit dom0), and tried to boot a i86xen (vs i86pc) version of the kernel debugger (kmdb). Again, I was following footsteps here. John L had done a ton of work getting kmdb to work in domU (since we already had Solaris domU running on a Linux dom0). And Todd and/or John L had already debugged kmdb on a Solaris dom0. So I was at kmdb prompt ready to venture into unknown territory.

So before I could boot my Solaris dom0, I had to build one. Up to this point, we only had the driver changes we needed for domU. Before xen, we only had one x86 "platform", i86pc.

This is unlike SPARC, which usually gets a new "platform" or every major architecture change (e.g. sun4m, sun4u, sun4v). On SPARC, you'll also see machine specific platmod's and platform directories to provide additional functionality and modules which are specific to a given machine (e.g. /platform/SUNW,Sun-Fire-880).

For xen (on x86), we have a new "platform", i86xen. For Solaris dom0, we we're missing all of the drivers which were in i86pc (i.e. they did not show up in i86xen). The vast majority of these drivers aren't platform specific and can go into intel, i.e. doesn't have any platform specific code (which today is i86pc and i86xen). So I had to try to move each driver over to intel and see if it had platform specific code or not. Since there was only one intel "platform" in the past, the lines we're a little gray at times. But I finally got through it and ended up moving around 40 drivers in src/uts and a little over 15 in closed/uts, to intel from i86pc. For the rest, I need to create makefile in i86xen to build a platform specific version of these drivers.

Now I had a Solaris dom0 kernel to boot. I setup my cap-eye install kernel, rebooted into kmdb, and :c'd into a new world. The majority of the hard work was already done bringing up domU. The CPU and VM code for domU, done by Tim, Todd, and Joe just worked for domain 0. That made life very simple.

The first problem I ran into was the internal pci config access setup in mlsetup. It was initially shutoff for domU, I had added it back in for dom0. However, this requires a call to the BIOS, which xen doesn't allow. So I changed the code to default to PCI_MECHANISM_1 for i86xen dom0.

From there, the next problem I ran into was ins/outs weren't working.. That was fixed with a HYPERVISOR_physdev_op (PHYSDEVOP_SET_IOPL), which ended up being slightly wrong and fixed by Todd before we released.

Now I was at the point where we are attaching drivers and the drivers are trying to map in their registers. Joe had done a bunch of work in the VM getting the infrastructure ready for foreign PFNs, which are basically PFN's which are tagged to mark then as containing the real MFN, instead of being present in the mfn_list. Since this was the first time trying that code out, I ran into a couple of minor bugs. The more interesting problem was that Xen was using one of the software PTE bits in a debug version of Xen which conflicted with the bit we we're using to mark the page as a foreign. I commented out that feature and rebuilt Xen and continued on while Joe worked on changing the PTE software bits to be encoded instead of individual flags to avoid bit 2 int PTE software field.

I had already changed the code in rootnex to convert the MFN (device register access) to a foreign PFN during ddi_regs_map_setup(). So once the PTE software bits were cleared we were sailing through the driver reading its device registers and on to mapping memory for device DMA.

I had also modified the rootnex dma bind routine. When we're building dma cookies, we need to put MFNs in the cookies instead of PFNs. I had a couple of bugs in that code, fixed that up, then ran into the contig alloc code path. I hadn't coded up the contig alloc code path changes yet (were we want to allocate physically contiguous memory). So I cheated and temporarily took out all the drivers which required contig alloc, and did the contig alloc code at a later time (my boot device didn't need it :-) )

Now I was up to vfs_mountroot(). This is where the Solaris drivers start taking over disk activity and stop using the BIOS to load blocks. This is also where we first start noticing problems if interrupts don't work.

This is where I handed off the Stu :-). This was the last of the hard problems. Stu had been busily working on Solaris dom0 interrupt support. A mix of event channels, pcplusmp, ACPI, and APICs. Something I would never wish on anyone. Stu got it up and working remarkably fast (something he should talk about :-)) and I was back and running up to the console handover.

The console config code is a little bit messy in solaris. I waded through that for a little bit. All of the code was originally in the common intel part of the code. I moved the platform specific code to i86pc and i86xen then have a different implementation in i86xen which basically always sends the Solaris console to the Xen console. Not sure if it will stay that way in the end, but that makes the most sense IMO.

And from there, I was at the multi-user prompt..

Some other interesting problems I ran into during the bringup. I had to have isa fail to attach on a Solaris domU. The ISA leaf drivers assume the device is present and bad things happen. There were a couple places in the kernel where they have hard coded physical address which it tries to map in (e.g. psm_map_phys_new; the lower 1M of memory, used for BIOS tables, etc.; and xsvc used by Xorg/Xsun). And we found out the hard way that Xen's low mem alloc implementation is linux specific. Only allocates memory < 4G && > 2G. We need to redo our first pass at implementing memory constrained allocs.

As far as booting 64-bit Solaris dom0, it booted up the first time.

We'll that enough for now.. I'll save the bringup of domUs on a Solaris dom0 for the next post. That was a little more challenging...

About

mrj

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today