Wednesday Apr 23, 2008

zones and ZFS file systems

Starting off with a freshly created pool, let's see the steps to create a zone based on a ZFS file system. Here we see our new pool with only one file system:

fsh-sole# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
kwame   160K  7.63G    18K  /kwame
fsh-sole#

Now, we'll create and configure a local zone "ejkzone". Note, we set the zonepath within the path of the ZFS pool:

fsh-sole# zonecfg -z ejkzone
ejkzone: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:ejkzone> create
zonecfg:ejkzone> set zonepath=/kwame/kilpatrick
zonecfg:ejkzone> commit
zonecfg:ejkzone> exit
fsh-sole#

Now we install zone "ejkzone" and notice that the installation tells us that it will automatically create a ZFS file system for us:

fsh-sole# zoneadm -z ejkzone install
A ZFS file system has been created for this zone.
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <10116> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1198> packages on the zone.
Initialized <1198> packages on zone.                                 
Zone  is initialized.
The file  contains a log of the zone installation.
fsh-sole#

Now we can boot the zone to use it, and can also see that the file system kwame/kilpatrick was automatically created for us:

fsh-sole# zoneadm -z ejkzone boot   
fsh-sole# zoneadm list
global
ejkzone
fsh-sole# zoneadm -z ejkzone list -v
  ID NAME             STATUS     PATH                           BRAND    IP    
   3 ejkzone          running    /kwame/kilpatrick              native   shared
fsh-sole# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
kwame              517M  7.12G    20K  /kwame
kwame/kilpatrick   517M  7.12G   517M  /kwame/kilpatrick
fsh-sole# 

Now if we login into the zone via 'zlogin -C ejkzone', we notice that the local zone cannot see any ZFS file systems (only the global zone can):

ejkzone# zfs list
no datasets available
ejkzone# 

If we then want to create and delegate some ZFS file systems to the local zone "ejkzone" so that "ejkzone" has administrative control over the file systems, we can do that. From the global zone, we do:

fsh-sole# zfs create kwame/textme
fsh-sole# zonecfg -z ejkzone
zonecfg:ejkzone> add dataset
zonecfg:ejkzone:dataset> set name=kwame/textme
zonecfg:ejkzone:dataset> end
zonecfg:ejkzone> exit
fsh-sole#

Now, we can get the "zoned" property of the newly created file system:

fsh-sole# zfs get zoned kwame/textme 
NAME          PROPERTY  VALUE         SOURCE
kwame/textme  zoned     off           default
fsh-sole# 

Huh, it says "off". But we delegated it to a local zone. Why is that? Well in order for this to take affect, we have to reboot the local zone. After doing that, we can see from the global zone:

fsh-sole# zfs get zoned kwame/textme
NAME          PROPERTY  VALUE         SOURCE
kwame/textme  zoned     on            local
fsh-sole# 

And from the local zone "ejkzone":

ejkzone# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
kwame          595M  7.05G    20K  /kwame
kwame/textme    18K  7.05G    18K  /kwame/textme
ejkzone# 

And now we have administrative control over the file system via the local zone:

ejkzone# zfs get copies kwame/textme 
NAME          PROPERTY  VALUE         SOURCE
kwame/textme  copies    1             default
ejkzone# zfs set copies=2 kwame/textme
ejkzone# zfs get copies kwame/textme  
NAME          PROPERTY  VALUE         SOURCE
kwame/textme  copies    2             local
ejkzone# 

Double checking on the global zone:

fsh-sole# zfs get copies kwame/textme
NAME          PROPERTY  VALUE         SOURCE
kwame/textme  copies    2             local
fsh-sole# zpool history -l
History for 'kwame':
2008-04-23.16:01:17 zpool create -f kwame c1d0s3 [user root on fsh-sole:global]
2008-04-23.16:29:42 zfs create kwame/textme [user root on fsh-sole:global]
2008-04-23.16:36:45 zfs set copies=2 kwame/textme [user root on fsh-sole:ejkzone]

fsh-sole# 

Happy zoning

Wednesday Mar 14, 2007

iSCSI storage with zvols

Now that iSCSI support is built into ZFS, let's see how to setup some storage with zvols.

On the server, we create a pool, a zvol, and share the zvol over iSCSI:

fsh-suzuki# zpool create iscsistore c0t1d0
fsh-suzuki# zfs create -s -V 10gb iscsistore/zvol
fsh-suzuki# zfs set shareiscsi=on iscsistore/zvol
fsh-suzuki# iscsitadm list target -v
Target: iscsistore/zvol
    iSCSI Name: iqn.1986-03.com.sun:02:a7f19760-5d17-ee50-f011-c4c749add692
    Alias: iscsistore/zvol
    Connections: 0
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 0x0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size:   10G
            Backing store: /dev/zvol/rdsk/iscsistore/zvol
            Status: online
fsh-suzuki# 

Now on the client, we need to discover the iSCSI share (192.168.16.135 is the IP of the server):

fsh-weakfish# iscsiadm list discovery
Discovery:
        Static: disabled
        Send Targets: enabled
        iSNS: disabled
fsh-weakfish# iscsiadm modify discovery --sendtargets enable
fsh-weakfish# iscsiadm add discovery-address 192.168.16.135
fsh-weakfish# svcadm enable network/iscsi_initiator
fsh-weakfish# iscsiadm list target
Target: iqn.1986-03.com.sun:02:a7f19760-5d17-ee50-f011-c4c749add692
        Alias: iscsistore/zvol
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
fsh-weakfish# 

Now we can create a pool on the client using the iSCSI device:

fsh-weakfish# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t1d0 
          /pci@0,0/pci1022,7450@a/pci17c2,10@4/sd@1,0
       1. c2t01000003BAAAE84F00002A0045F86E49d0 
          /scsi_vhci/disk@g01000003baaae84f00002a0045f86e49
Specify disk (enter its number): \^C
fsh-weakfish# zpool create i c2t01000003BAAAE84F00002A0045F86E49d0
fsh-weakfish# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
i                      9.94G     89K   9.94G     0%  ONLINE     -
fsh-weakfish# zpool status
  pool: i
 state: ONLINE
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        i                                        ONLINE       0     0     0
          c2t01000003BAAAE84F00002A0045F86E49d0  ONLINE       0     0     0

errors: No known data errors
fsh-weakfish# 

Yep, that's it!

Friday Aug 04, 2006

iSCSI setup

Now that both the iSCSI target and initiator are in Solaris, let's see how to set one up. On the target machine, we do the following...

First, use 'iscsitadm' to provide a directory where the iSCSI repository and configuration information is stored. This is also the directory where the logical units will be stored if you create luns that are disk emulated files (the default).

hodur# iscsitadm modify admin -d /var/tmp/iscsi

Now actually create the luns. I chose to create pass-through raw devices instead of file based disk emulating luns.

hodur# iscsitadm create target --type raw --backing-store /dev/rdsk/c4t2d0 c4t2d0
hodur# iscsitadm create target --type raw --backing-store /dev/rdsk/c4t4d0 c4t4d0
hodur# iscsitadm create target --type raw --backing-store /dev/rdsk/c4t8d0 c4t8d0
hodur# iscsitadm create target --type raw --backing-store /dev/rdsk/c4t12d0 c4t12d0

Let's get a listing of the targets on this machine:

hodur# iscsitadm list target 
Target: c4t2d0
    iSCSI Name: iqn.1986-03.com.sun:02:aaf6d680-6681-e36e-8497-855decbf8038.c4t2d0
    Connections: 0
Target: c4t4d0
    iSCSI Name: iqn.1986-03.com.sun:02:f935aa89-d195-6ed9-9a1a-febe8d0550d2.c4t4d0
    Connections: 0
Target: c4t8d0
    iSCSI Name: iqn.1986-03.com.sun:02:5af80810-aa7c-c8c2-e566-af70bd579219.c4t8d0
    Connections: 0
Target: c4t12d0
    iSCSI Name: iqn.1986-03.com.sun:02:b3120e6c-896c-e3e3-c908-8a43789997d9.c4t12d0
    Connections: 0
hodur#

Now onto the initiator:

fsh-mullet# iscsiadm add discovery-address 
fsh-mullet# iscsiadm modify discovery -t enable

We wait a few seconds, and then verify the status on the target machine (notice the 4 luns are now online instead of offline):

hodur# iscsitadm list target -v
Target: c4t2d0
    iSCSI Name: iqn.1986-03.com.sun:02:aaf6d680-6681-e36e-8497-855decbf8038.c4t2d0
    Connections: 1
        Initiator:
            iSCSI Name: iqn.1986-03.com.sun:01:0003ba73b35b.44d23863
            Alias: fsh-mullet
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 010000e0812a8f5300002a0044d3ef91
            VID: SUN
            PID: SOLARIS
            Type: raw
            Size:   34G
            Backing store: /dev/rdsk/c4t2d0
            Status: online
Target: c4t4d0
    iSCSI Name: iqn.1986-03.com.sun:02:f935aa89-d195-6ed9-9a1a-febe8d0550d2.c4t4d0
    Connections: 1
        Initiator:
            iSCSI Name: iqn.1986-03.com.sun:01:0003ba73b35b.44d23863
            Alias: fsh-mullet
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 010000e0812a8f5300002a0044d3ef8e
            VID: SUN
            PID: SOLARIS
            Type: raw
            Size:   34G
            Backing store: /dev/rdsk/c4t4d0
            Status: online
Target: c4t8d0
    iSCSI Name: iqn.1986-03.com.sun:02:5af80810-aa7c-c8c2-e566-af70bd579219.c4t8d0
    Connections: 1
        Initiator:
            iSCSI Name: iqn.1986-03.com.sun:01:0003ba73b35b.44d23863
            Alias: fsh-mullet
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 010000e0812a8f5300002a0044d3ef8a
            VID: SUN
            PID: SOLARIS
            Type: raw
            Size:   34G
            Backing store: /dev/rdsk/c4t8d0
            Status: online
Target: c4t12d0
    iSCSI Name: iqn.1986-03.com.sun:02:b3120e6c-896c-e3e3-c908-8a43789997d9.c4t12d0
    Connections: 1
        Initiator:
            iSCSI Name: iqn.1986-03.com.sun:01:0003ba73b35b.44d23863
            Alias: fsh-mullet
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 010000e0812a8f5300002a0044d3ef87
            VID: SUN
            PID: SOLARIS
            Type: raw
            Size:   34G
            Backing store: /dev/rdsk/c4t12d0
            Status: online
hodur# 

Now back on the initiator, we check to see if we can see those devices:

fsh-mullet# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@1c,600000/scsi@2/sd@0,0
       1. c0t1d0 
          /pci@1c,600000/scsi@2/sd@1,0
       2. c2t1d0 
          /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ab3120e6c-896c-e3e3-c908-8a43789997d9.c4t12d00001,0
       3. c2t2d0 
          /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3A5af80810-aa7c-c8c2-e566-af70bd579219.c4t8d00001,0
       4. c2t4d0 
          /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Af935aa89-d195-6ed9-9a1a-febe8d0550d2.c4t4d00001,0
       5. c2t5d0 
          /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Aaaf6d680-6681-e36e-8497-855decbf8038.c4t2d00001,0
Specify disk (enter its number): \^C
fsh-mullet# 

Hmm, perhaps i could even create a RAID-Z out of those luns...

fsh-mullet# zpool create -f iscs-me raidz c2t1d0 c2t2d0 c2t4d0 c2t5d0
fsh-mullet# zpool status
  pool: iscs-me
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        iscs-me     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0

errors: No known data errors
fsh-mullet# 

Do i hear Mr. Koolaid? ... "OH YEAH!"

Thursday Feb 02, 2006

creating a stripe in ZFS vs. SVM/UFS

To create a 4 disk stripe in ZFS:

hodur# zpool create zfs_bonnie c4t2d0 c4t4d0 c4t8d0 c4t12d0
hodur# df -kh zfs_bonnie
Filesystem             size   used  avail capacity  Mounted on
zfs_bonnie             134G    26K   134G     1%    /zfs_bonnie
hodur# 

To create a 4 disk stripe in SVM:

hodur# metadb -a -f -c2 c4t2d0s0 c4t4d0s0

In the above command, the metadb is where SVM stores things like stripe width for stripes or the dirty region for mirrors. You can technically get away with only adding one metadb, but having two adds redundancy... could also go with four (one on each disk) but that just becomes overkill (as periodically the master metadb needs to sync with the slave(s) ). And yes, it would be really nice if this was all just automated (hmm like the above zfs command). Next...

hodur# metainit d1 1 4 c4t2d0s0 c4t4d0s0 c4t8d0s0 c4t12d0s0 -i 256k
d1: Concat/Stripe is setup
hodur#

In the above command, the "1" tells us to create one stripe, the "4" tells us how many slices to make that stripe out of, and the "-i 256k" tells us to make the stripe width 256kB (instead of the default 16kB). Continuing...

hodur# newfs /dev/md/rdsk/d1
newfs: construct a new file system /dev/md/rdsk/d1: (y/n)? y
Warning: 4096 sector(s) in last cylinder unallocated
/dev/md/rdsk/d1:        284327936 sectors in 46278 cylinders of 48 tracks, 128 sectors
        138832.0MB in 2893 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
.........................................................
super-block backups for last 10 cylinder groups at:
 283410848, 283509280, 283607712, 283706144, 283804576, 283903008, 284001440,
 284099872, 284198304, 284296736
hodur# mkdir /ufs_bonnie
hodur# mount -F ufs /dev/md/dsk/d1 /ufs_bonnie
hodur# df -kh ufs_bonnie        
Filesystem             size   used  avail capacity  Mounted on
/dev/md/dsk/d1         134G   6.9G   125G     6%    /ufs_bonnie
hodur# 

One method is straight forward, the other method caused me to write a blog entry so i'd remember how to do it.

Wednesday Feb 01, 2006

enabling the write cache

To enable the write cache you can use the format(1M) command:

i_like_corruption# format -e
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@1c,600000/scsi@2/sd@0,0
       1. c0t1d0 
          /pci@1c,600000/scsi@2/sd@1,0
Specify disk (enter its number): 1
selecting c0t1d0
[disk formatted]
/dev/dsk/c0t1d0s0 is in use by zpool zfs_tar. Please see zpool(1M).


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        inquiry    - show vendor, product and revision
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !     - execute , then return
        quit
format> cache


CACHE MENU:
        write_cache - display or modify write cache settings
        read_cache  - display or modify read cache settings
        !      - execute , then return
        quit
cache> write      


WRITE_CACHE MENU:
        display     - display current setting of write cache
        enable      - enable write cache
        disable     - disable write cache
        !      - execute , then return
        quit
write_cache> display
Write Cache is disabled
write_cache> enable
write_cache> display
Write Cache is enabled
write_cache> q

Now, of course, be careful when you do this as data corruption can happen!

It was nice for me to do this as i was taking a simple v20z with a single SCSI disk to run some prelim specSFS numbers, and the write cache will mitigate (somewhat) the synchronous writes/commits. Of course you need much more than a single SCSI disk to get real specSFS numbers.

Or if you want to be snazier, Neil wrote a script to do this:

i_like_corruption# write_cache display c0t1d0
Write Cache is disabled
i_like_corruption# write_cache enable c0t1d0 
Write Cache is enabled
i_like_corruption# write_cache display c0t1d0
Write Cache is enabled
i_like_corruption# 

Here's the script:

#!/bin/ksh
#
# issue write cache commands to format(1M).
# commands can be to individual disks or all disks.
#
# usage: write_cache [-v] display|enable|disable all|
# E.g.:
#    write_cache disable all
#    write_cache -v enable c2t1d0

id=`id | sed -e 's/uid=//' -e 's/(.\*//'`
if [ $id != "0" ] ; then
        printf "No permissions"
        exit 1
fi

# tmp files
cmds=/tmp/write_cache_commands.txt.$$
disks=/tmp/all_disk_list.txt.$$

silent=-s
if [ "$1" = "-v" ] ; then
        # in verbose mode turn off silent format option
        silent=
        shift
fi

cat > $cmds << EOF
cache
write_cache
$1
EOF

if [ "$2" = "all" ]; then
        echo disk | format 2>/dev/null | fgrep ". c" \\
            | nawk '{ print $2 }' > $disks
        for i in `cat $disks`
        do
                format -e $silent -f $cmds $i 2>/dev/null
                if [ "$silent" = "-s" ]; then
                        # print write cache state using recursion
                        printf "%s : " $i
                        write_cache -v display $i | fgrep "Write Cache is"
                fi
        done
else
        format -e $silent -f $cmds $2
        if [ "$silent" = "-s" ]; then
                # print write cache state using recursion
                write_cache -v display $2 | fgrep "Write Cache is"
        fi
fi

rm -f $cmds $disks

Oh yeah, make sure its in your PATH. It can be run like this:

hodur# write_cache enable c4t2d0 
Write Cache is enabled
hodur# write_cache disable c4t2d0
Write Cache is disabled
hodur# 

Friday Apr 08, 2005

setting up b2b machines on Solaris

Ok, setting up back to back machines should be simple, right? So mostly so, but sometimes I do everything right and some f-ing ethernet cable is busted and has me questioning which file i forgot to change. So here you go:

  • edit /etc/nsswitch.conf to have 'hosts' and 'ipnodes' look for 'files' before DNS/NIS/etc.
  • make up some fake IP addresses and hostnames - say 10.x.x.y (client-b2b) and 10.x.x.y+1 (server-b2b), add those to /etc/hosts and /etc/inet/ipnodes on both the client and server
  • add a line to /etc/netmasks (with the above fake IPs, '10.x.x.0 255.255.255.0' would work)
  • create a /etc/hostname.[interface name][interface number] -- for me, i'm using a v20z and it has a dual port internal bge interface, so i create /etc/hostname.bge1 (since /etc/hostname.bge0 already exists), and fill it with the b2b machine name ("server-b2b" or "client-b2b" in this example)
  • Hook up your cable, Reboot both machines

Now let's check the client:

client# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
  inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
  inet a.b.c.d netmask ffffff00 broadcast a.b.c.255
  ether f:f:f:f:f:f
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
  inet 10.x.x.y netmask ffffff00 broadcast 10.x.x.255
  ether f:f:f:f:f:f
client#

Note, that if bge1 isn't 'RUNNING' then something is probably wrong.

Now, let's make sure it worked:

client# traceroute server-b2b
traceroute: Warning: Multiple interfaces found; using 10.x.x.y @ bge1
traceroute to server-b2b (10.x.x.y+1), 30 hops max, 40 byte packets
  1 server-b2b (10.x.x.y+1) 0.822 ms 0.323 ms 0.244 ms
client#

client# netstat -rn | grep 10.x.x.0
10.x.x.0  10.x.x.y  U  1  1  bge1
client#

Woo wee! and yes the 10.x.x.y, a.b.c.d, and ether addresses are all totally bogus.

About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today