Thursday Oct 11, 2012

Oracle Solaris 11 ZFS Lab for Openworld 2012

Preface

This is the content from the Oracle Openworld 2012 ZFS lab. It was well attended - the feedback was that it was a little short - thats probably because in writing it I bacame very time-concious after the ASM/ACFS on Solaris extravaganza I ran last year which was almost too long for mortal man to finish in the 1 hour session. Enjoy.

Table of Contents

Introduction

This set of exercises is designed to briefly demonstrate new features in Solaris 11 ZFS file system: Deduplication, Encryption and Shadow Migration. Also included is the creation of zpools and zfs file systems - the basic building blocks of the technology, and also Compression which is the compliment of Deduplication. The exercises are just introductions - you are referred to the ZFS Adminstration Manual for further information. From Solaris 11 onward the online manual pages consist of zpool(1M) and zfs(1M) with further feature-specific information in zfs_allow(1M), zfs_encrypt(1M) and zfs_share(1M). The lab is easily carried out in a VirtualBox running Solaris 11 with 6 virtual 3 Gb disks to play with.

Exercise Z.1: ZFS Pools

Task: You have several disks to use for your new file system. Create a new zpool and a file system within it.

Lab: You will check the status of existing zpools, create your own pool and expand it.

Your Solaris 11 installation already has a root ZFS pool. It contains the root file system. Check this:

root@solaris:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  15.9G  6.62G  9.25G  41%  1.00x  ONLINE  -

root@solaris:~# zpool status 
pool: rpool
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
rpool       ONLINE       0     0     0
  c3t0d0s0  ONLINE       0     0     0

errors: No known data errors

Note the disk device the root pool is on - c3t0d0s0

Now you will create your own ZFS pool. First you will check what disks are available:

root@solaris:~# echo | format 
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c3t0d0 <ATA-VBOX HARDDISK-1.0 cyl 2085 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t5d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@5,0
5. c3t6d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@6,0
6. c3t7d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number): 

The root disk is numbered 0. The others are free for use. Try creating a simple pool and observe the error message:

root@solaris:~# zpool create mypool c3t2d0 c3t3d0 
'mypool' successfully created, but with no redundancy; failure of one
device will cause loss of the pool

So destroy that pool and create a mirrored pool instead:

root@solaris:~# zpool destroy mypool  
root@solaris:~# zpool create mypool mirror c3t2d0 c3t3d0 
root@solaris:~# zpool status mypool 
pool: mypool
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
mypool      ONLINE       0     0     0
  mirror-0  ONLINE       0     0     0
    c3t2d0  ONLINE       0     0     0
    c3t3d0  ONLINE       0     0     0

errors: No known data errors

Back to top

Exercise Z.2: ZFS File Systems

Task: You have to create file systems for later exercises.

You can see that when a pool is created, a file system of the same name is created:

root@solaris:~# zfs list 
NAME                     USED  AVAIL  REFER  MOUNTPOINT
mypool                  86.5K  2.94G    31K  /mypool

Create your filesystems and mountpoints as follows:

root@solaris:~# zfs create -o mountpoint=/data1 mypool/mydata1 

The -o option sets the mount point and automatically creates the necessary directory.

root@solaris:~# zfs list mypool/mydata1 
NAME            USED  AVAIL  REFER  MOUNTPOINT
mypool/mydata1   31K  2.94G    31K  /data1

Back to top

Exercise Z.3: ZFS Compression

Task:Try out different forms of compression available in ZFS

Lab:Create 2nd filesystem with compression, fill both file systems with the same data, observe results

You can see from the zfs(1) manual page that there are several types of compression available to you, set with the property=value syntax:

compression=on | off | lzjb | gzip | gzip-N | zle

 Controls  the  compression  algorithm  used   for   this
 dataset. The lzjb compression algorithm is optimized for
 performance while  providing  decent  data  compression.
 Setting  compression  to  on  uses  the lzjb compression
 algorithm. The gzip compression algorithm uses the  same
 compression  as the gzip(1) command. You can specify the
 gzip level by using the  value  gzip-N  where  N  is  an
 integer  from 1 (fastest) to 9 (best compression ratio).
 Currently, gzip is equivalent to gzip-6 (which  is  also
 the default for gzip(1)).

Create a second filesystem with compression turned on. Note how you set and get your values separately:

root@solaris:~# zfs create -o mountpoint=/data2 mypool/mydata2 
root@solaris:~# zfs set compression=gzip-9 mypool/mydata2 
root@solaris:~# zfs get compression mypool/mydata1 
NAME            PROPERTY     VALUE     SOURCE
mypool/mydata1  compression  off       default

root@solaris:~# zfs get compression mypool/mydata2 
NAME            PROPERTY     VALUE     SOURCE
mypool/mydata2  compression  gzip-9    local

Now you can copy the contents of /usr/lib into both your normal and compressing filesystem and observe the results. Don't forget the dot or period (".") in the find(1) command below:

root@solaris:~# cd /usr/lib
root@solaris:/usr/lib# find . -print | cpio -pdv /data1 
root@solaris:/usr/lib# find . -print | cpio -pdv /data2 

The copy into the compressing file system takes longer - as it has to perform the compression but the results show the effect:

root@solaris:/usr/lib# zfs list 
NAME                     USED  AVAIL  REFER  MOUNTPOINT
mypool                  1.35G  1.59G    31K  /mypool
mypool/mydata1          1.01G  1.59G  1.01G  /data1
mypool/mydata2           341M  1.59G   341M  /data2

Note that the available space in the pool is shared amongst the file systems. This behavior can be modified using quotas and reservations which are not covered in this lab but are covered extensively in the ZFS Administrators Guide.

Back to top

Exercise Z.4: ZFS Deduplication

The deduplication property is used to remove redundant data from a ZFS file system. With the property enabled duplicate data blocks are removed synchronously. The result is that only unique data is stored and common componenents are shared.

Task:See how to implement deduplication and its effects

Lab: You will create a ZFS file system with deduplication turned on and see if it reduces the amount of physical storage needed when we again fill it with a copy of /usr/lib.

root@solaris:/usr/lib# zfs destroy mypool/mydata2 
root@solaris:/usr/lib# zfs set dedup=on mypool/mydata1
root@solaris:/usr/lib# rm -rf /data1/*  
root@solaris:/usr/lib# mkdir /data1/2nd-copy
root@solaris:/usr/lib# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
mypool                  1.02M  2.94G    31K  /mypool
mypool/mydata1            43K  2.94G    43K  /data1
root@solaris:/usr/lib# find . -print | cpio -pd /data1
2142768 blocks
root@solaris:/usr/lib# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
mypool                  1.02G  1.99G    31K  /mypool
mypool/mydata1          1.01G  1.99G  1.01G  /data1
root@solaris:/usr/lib# find . -print | cpio -pd /data1/2nd-copy 
2142768 blocks
root@solaris:/usr/lib#zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
mypool                  1.99G  1.96G    31K  /mypool
mypool/mydata1          1.98G  1.96G  1.98G  /data1

You could go on creating copies for quite a while...but you get the idea. Note that deduplication and compression can be combined: the compression acts on metadata.

Deduplication works across file systems in a pool and there is a zpool-wide property dedupratio:

root@solaris:/usr/lib# zpool get dedupratio mypool
NAME    PROPERTY    VALUE  SOURCE
mypool  dedupratio  4.30x  -

Deduplication can also be checked using "zpool list":

root@solaris:/usr/lib# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
mypool  2.98G  1001M  2.01G  32%  4.30x  ONLINE  -
rpool   15.9G  6.66G  9.21G  41%  1.00x  ONLINE  -

Before moving on to the next topic, destroy that dataset and free up some space:

root@solaris:~# zfs destroy  mypool/mydata1 

Back to top

Exercise Z.5: ZFS Encryption

Task: Encrypt sensitive data.

Lab: Explore basic ZFS encryption.

This lab only covers the basics of ZFS Encryption. In particular it does not cover various aspects of key management. Please see the ZFS Adminastrion Manual and the zfs_encrypt(1M) manual page for more detail on this functionality.

Back to top

root@solaris:~# zfs create -o encryption=on mypool/data2 
Enter passphrase for 'mypool/data2': ********
Enter again: ********
root@solaris:~# 

Creation of a descendent dataset shows that encryption is inherited from the parent:

root@solaris:~# zfs create mypool/data2/data3 
root@solaris:~# zfs get -r  encryption,keysource,keystatus,checksum mypool/data2 
NAME                PROPERTY    VALUE              SOURCE
mypool/data2        encryption  on                 local
mypool/data2        keysource   passphrase,prompt  local
mypool/data2        keystatus   available          -
mypool/data2        checksum    sha256-mac         local
mypool/data2/data3  encryption  on                 inherited from mypool/data2
mypool/data2/data3  keysource   passphrase,prompt  inherited from mypool/data2
mypool/data2/data3  keystatus   available          -
mypool/data2/data3  checksum    sha256-mac         inherited from mypool/data2
You will find the online manual page zfs_encrypt(1M) contains examples. In particular, if time permits during this lab session you may wish to explore the changing of a key using "zfs key -c mypool/data2".

Exercise Z.6: Shadow Migration

Shadow Migration allows you to migrate data from an old file system to a new file system while simultaneously allowing access and modification to the new file system during the process. You can use Shadow Migration to migrate a local or remote UFS or ZFS file system to a local file system.

Task: You wish to migrate data from one file system (UFS, ZFS, VxFS) to ZFS while mainaining access to it.

Lab: Create the infrastructure for shadow migration and transfer one file system into another.

First create the file system you want to migrate

root@solaris:~# zpool create oldstuff c3t4d0 
root@solaris:~# zfs create oldstuff/forgotten 

Then populate it with some files:

root@solaris:~# cd /var/adm 
root@solaris:/var/adm# find . -print | cpio -pdv /oldstuff/forgotten

You need the shadow-migration package installed:

root@solaris:~# pkg install shadow-migration
           Packages to install:  1
       Create boot environment: No
Create backup boot environment: No
            Services to change:  1

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       14/14      0.2/0.2

PHASE                                        ACTIONS
Install Phase                                  39/39

PHASE                                          ITEMS
Package State Update Phase                       1/1 
Image State Update Phase                         2/2 

You then enable the shadowd service:

root@solaris:~# svcadm enable shadowd
root@solaris:~# svcs shadowd 
STATE          STIME    FMRI
online          7:16:09 svc:/system/filesystem/shadowd:default

Set the filesystem to be migrated to read-only

root@solaris:~# zfs set readonly=on oldstuff/forgotten

Create a new zfs file system with the shadow property set to the file system to be migrated:

root@solaris:~# zfs create -o shadow=file:///oldstuff/forgotten   mypool/remembered 

Use the shadowstat(1M) command to see the progress of the migration:

root@solaris:~# shadowstat 
					EST		
				BYTES	BYTES		ELAPSED
DATASET				XFRD	LEFT	ERRORS	TIME
mypool/remembered               92.5M	-	-	00:00:59
mypool/remembered               99.1M	302M	-	00:01:09
mypool/remembered               109M	260M	-	00:01:19
mypool/remembered               133M	304M	-	00:01:29
mypool/remembered               149M	339M	-	00:01:39
mypool/remembered               156M	86.4M	-	00:01:49
mypool/remembered               156M	8E	29		(completed)

Note that if you had created /mypool/remembered as encrypted, this would be the preferred method of encrypting existing data. Similarly for compressing or deduplicating existing data.

The procedure for migrating a file system over NFS is similar - see the ZFS Administration manual.

That concludes this lab session.

Wednesday Nov 18, 2009

ZFS HSP Demo

Here is a demo of the use of ZFS Hybrid Storage Pools that I put together. Its quite neat in that it is reproduceable fairly easily. My thanks to Mo Beik for doing the heavy lifting.

Open Storage Demos

Here is series of Brief Open Storage Demo's that I put together on various aspects of Solaris storage software such as ZFS, NFS, CIFS, iSCSI - just the basics to get you started.

I originally created them for SuperComputer 2008 and have only just rediscovered them. Enjoy.

Monday Jan 23, 2006

Filesystem Benchmarks: vxbench

For a long time I've used a simple I/O load generator from Veritas called vxbench for doing just that - generating I/O loads against systems that have been configured up either in the lab or on customer sites. vxbench is a tool available on AIX, HP-UX, Linux and Solaris for benchmarking I/O loads on raw disk or file systems. It can produce various I/O workloads such as sequential and random reads/writes asynchronous I/Os, and memory mapped (mmap) operations. It has many options specific to the VERITAS File System (VxFS).

It also has characteristics that I need in a simple load generator - specifically it can generate multithreaded workloads which are essential and it has a simple command-line interface which makes it easy to incorporate in a scripting harness. It can also do strided reads/writes and sleep - important for database-like operations.

vxbench arrives on the CD in the package VRTSspt - the Veritas Software Support Tools and most sites have it to hand. However I've always shied away from publishing any work done with it because I 've never quite pinned down its status as a piece of software in terms of copyright or license. Recently however I've been driven to take a closer look. Two papers appeared recently which I'm afraid I can cite but not give you a URL for:

  • Study of Linux I/O Performance Characteristics for Volume Managers on an Interl Xeon Server (2004) Xianneng Shen, Jim Nagler, Randy Taylor, Clark McDonald. Proc CMG 2004
  • I/O Performance Characteristics for Volume Managers on LInux 2.6 Servers (2005) Dan Yee, Xianneng Shen. Proc CMG 2005

As corporate history has moved on, the first paper is copyrighted by the VERITAS Software Corporation and the second by the Symantec Corporation. I never realised that CMG does not own the content of its own proceedings but there you are. The second paper is a continuation of the first and uses the same methodology and tools. Yes; vxbench.

At first sight this is a little annoying - as another recent paper (which I won't point you at just at the moment because I want to talk about it in more detail in a later post) pointed out, if you can't reproduce a benchmark from its report, its not really very scientific and I'm sure thats not what the authors of these papers intended. This need for the rigor imposed by writing reproduceability into benchmarking papers is one reason why people working in the field often resort to the "usual suspects" when looking for load generators - iozone, postmark, bonnie++. They all have their weaknesses but are at least available on the net.

So I set about tracking down vxbench. The header in the source code was not encouraging; "This software contains confidential information and trade secrets of VERITAS Software. Use, disclosure or reproduction is prohibited without prior express written permission of VERITAS Software". Well, I won't be sharing any more of the contents of vxbench.c with you, thats for sure. Onward!

to....a Veritas support document pointed out to me by the README that comes with the package. Apparently you can download the package from the VERITAS ftp site (without the need to purchase media and/or a license). The support document was no more encouraging than the source header; "These tools are designed to be used under the direction of a VERITAS Technical Support Engineer only." Does this mean you shouldn't use them in other circumstances? (For "other circumstances" read "benchmarking against competing vendors of storage software") Well, it seems you can. Document 261451 leaves out the sentence that follows the one the I've quoted, but in the README.VRTSspt it continues on; "Any other use of these tools is at your own risk." So you can amuse yourself with vxbench and publish the results but if you fry your disks and panic your system you have only yourself to blame.

Vxbench is a useful tool. Its availability is important - the implementors of Linux LVM (and VxVM!) will no doubt want to study these papers and work to improve their products. I'm glad Symantec continue to make it available to the storage software community.

Wednesday Dec 28, 2005

Getting to grips with NTFS - Part II

I have got fed up with the funeral march that is talking to the KVM switch that controls the Windows 2003 Server I am doing testing on via HTTP - see last post. I have installed Microsoft's Windows Services for Unix which includes the Unix shell utilities, NFS, pthreads (more of which later), various other bits and pieces but importantly for me, a Telnet daemon. I did scratch my head at the name "Windows Services for Unix". Shouldn't that be "Unix Services for Windows"? No matter, I have my backslash and pipe symbol back - although a DOS shell via telnet is very strange place to be for a bear of little brain like me. There is ps(1) and vi(1) and Lord knows what else I haven't discovered yet. I take my hat off to these fellows. They have done a good job. One might almost consider...No. Of course not.

Another tool I installed along the way was BGInfo from Sysinternals. If you tend to weave your way among a lot of MS boxes, having some basic information on the desktop is a great help.

Anyway, as promised, that sample NTFS filesystem creation session, brought to you via the technical marvel we know as Telnet:

\*===============================================================
Welcome to Microsoft Telnet Server.
\*===============================================================

C:\\home\\dominika>diskpart

Microsoft DiskPart version 5.2.3790.1830
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: VA64-2OC

DISKPART> list volume

  Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
  ----------  ---  -----------  -----  ----------  -------  ---------  --------
  Volume 0     S                FAT32  Stripe      1800 MB  Healthy
  Volume 1     D                       DVD-ROM         0 B  Healthy
  Volume 2     C                NTFS   Partition     24 GB  Healthy    System

DISKPART> select volume 0
Volume 0 is the selected volume.

Microsoft Gripe O' The Day. That "Volume 0" is the one I am testing on - incrementing the underlying number of spindles and watching the performance deltas, especially between FAT32 and NTFS. The gotcha is that if I delete it and recreate it, it might not be Volume 0 anymore. It might be Volume 2. Yesterday Drive C was Volume 0 and Drive S was Volume 2 until I rebooted. Then it changed again. As the volume is the object you have to manipulate, this makes it devilishly difficult to script this stuff. The rest is pretty straightforward;

DISKPART> delete volume

DiskPart successfully deleted the volume.
DISKPART> list disk

  Disk ###  Status      Size     Free     Dyn  Gpt
  --------  ----------  -------  -------  ---  ---
  Disk 0    Online        75 GB    25 GB
  Disk 1    Online        75 GB      0 B
  Disk 2    Online        34 GB    34 GB   \*
  Disk 3    Online        34 GB    34 GB   \*
  Disk 4    Online        34 GB    34 GB   \*
  Disk 5    Online        34 GB    34 GB   \*
  Disk 6    Online        34 GB    34 GB   \*
  Disk 7    Online        34 GB    34 GB   \*

DISKPART> create volume stripe disk=2,3,4,5,6,7 size=300
DiskPart successfully created the volume.

The asterisk in the Dyn column indicates that we are creating volumes spanning "dynamic disks" - an abstraction layer allowing us to spread multiple volumes over arbitrary disks. You can't have a "stripe" of one disk; that should be "simple". The size=300 indicates how much space (Mb) you want on each disk. This (as I noted in my last post) is not the stripe width, over which we have no control.

DISKPART> assign letter=s
DiskPart successfully assigned the drive letter or mount point.

DISKPART> exit
Leaving DiskPart...

C:\\home\\dominika>format s: /fs:fat32 /v:TestFS
The type of the file system is RAW.
The new file system is FAT32.

WARNING, ALL DATA ON NON-REMOVABLE DISK
DRIVE S: WILL BE LOST!
Proceed with Format (Y/N)? y
Verifying 1800M
Initializing the File Allocation Table (FAT)...
Format complete.

1,883,738,112 bytes total disk space.
1,883,734,016 bytes available on disk.

        4,096 bytes in each allocation unit.
      459,896 allocation units available on disk.

           32 bits in each FAT entry.

Volume Serial Number is A07E-7B4E

C:\\home\\dominika> iozone -a -z -i0 -f s:\\IozoneTest -b c:\\t\\f6.wks > c:\\t\\f6.out 2> c:\\t\\f6.err

...and we are up and away. Its worth saying that DISKPART does a lot of stuff asynchronously so its worth drawing breath between activity inside its shell and outside or else it gets in a muddle; particularly so if you are passing commands in via scripts interspersed with other command-line utilities that interact with the I/O subsystem - this can get it very confused as can multiple concurrent DISKPART sessions.

The numbers generated by this little escapade are interesting and not what you might expect. Later.

Tuesday Dec 20, 2005

Getting To Grips With NTFS

Getting to grips with NTFS

This blog entry is for the amusement of Solaris folks. Seasoned enterprise Windows administrators, look away.

Figure 1: The Windows Server 2003 Disk Management GUI - Looking Glass, it ain't!

For reasons best not gone into, I have to do I/O benchmarking on Windows Server 2003. To be fair, the command shell language has grown up a lot since I used to teach it to help-desk unfortunates 10 years ago. Scarily, it has adopted Unix style I/O redirection (e.g "2>&1" and so forth) and flow control making it the bastard son of Kornshell and DOS Batch. Hmm. Variable substitution is still reassuringly hellish though.

Configuring up filesystems is not a million miles from format(1M) and metainit(1M). In order to make it more interesting I am working with a server connected to a KVM switch which exports the Windows screen over HTTP. This has some interesting effects;

  • My mouse pointer and the mouse pointer on the server get further and further from each other over time (presumably due to lost packets on the net). This results in quite a performance - a sort of mouse driven tai-chi in very slow motion.
  • I am working on a laptop running Windows XP. When I went for a coffee break, the server screensaver came on - to log back in I have to give the three fingered salute (ctrl-alt-del). In order to do this without rebooting my laptop I had to walk over to a neighbouring Sunray! It was only later I discovered that the makers of the KVM software have actually thought of that and included menu options for sending that key combination.
  • The backslash character is completely unmapped. I have tried every combination of characters on my keyboard and all possible Alt-nnn combinations. I am reduced to running scripts where I have cut and pasted the hateful character in by hand. Ditto the pipe symbol.

Anyway enough whinging. How is it going? Well. To start with, its worth saying that S2003 comes with a graphical volume manager.(Control Panel -> Administrative Tools -> Computer Management -> Disk Management). This, as the Help menu item will re-assure you, is provided to Microsoft by the VERITAS Software Corporation, about which I have written before. Now you have found the GUI, forget you ever saw it for two reasons:

  • Its hobbled. Put more delicately it "prohibits you from inadvertently performing actions that may result in data loss". I have to confess I have not even bothered to explore what actions these might be. Bring 'em on.
  • As GUIs go, its not very good. This is strange because Veritas are adept at GUIs that have to deal with multiple layers of abstraction. This one is firmly two dimensional. The idea of configuring up filesystems from a couple of hundred spindles using this tool was not a pleasant one. Throw it away and...

Real men in this part of the operating system forest use Diskpart - a command shell which you can use interactively or you can pass it scripts on the command line.This is combined with the Format command, so named to confuse Solaris folk. Diskpart plays the role of format(1M) and metainit(1M). Format plays the role of newfs(1M) or mkfs(1M). Do keep up!

Interaction

I was going to list a quick sample of the filesystem creation process but the KVM web server makes it all too painful to gather the data.. Instead, I feel I probably haven't irritated Jonathon enough by 'fessing up to using XP (my two years at Dell with a Sun Ultra 1 Creator on my desk for similar purposes caused much management wailing and gnashing of teeth) so I'm going to whole-heartedly recommend Bill Stanecks' Windows Command Line Pocket Book from Microsoft Press. If you were brought up in a Unix shell, starting here will cut through a ton of larger books.

The Striping Conundrum

Anyone starting I/O tuning wants to match the modal read and write request of their application to the capabilities of the underlying hardware. The mechanism for achieving this is the volume manager (the functionality of which may or may not be part of the file system but never mind). The mechanics of QFS for example are set out here and its counterpart for Solaris Volume Manager is here. Similar information for Veritas Volume Manager is in this large PDF manual. The point I am making is that anyone with the vaguest interest in tuning their I/O subsystem ends up in this section of the manual for the product of choice. With Windows Server 2003, it was quite hard to find anything on this topic. My web search revealed many pages with bland reassurances similar to;

"With a striped volume, data is divided into blocks and spread in a fixed order among all the disks in the array, similar to spanned volumes. Striping writes files across all disks so that data is added to all disks at the same rate."

This really tells me nothing, or rather it begs more questions than it answers: How big are these blocks? What is the order (round-robin, parallel, ...). How do I change these things? Tantalising isn't it. More digging revealed this gnomic utterance;

"For Windows Server 2003, the size of each stripe is 64 kilobytes (KB)."

Thats it. End of story: I expected more, I really did - especially as I mentioned, its got Veritas written on the label. If I'm missing something (another manual? strange Registry witchcraft?) please let me know (I'm Dominic Kay, I work at Sun; take a wild guess at the email address). I know I could download Mark Russinovich and Bryce Cogswells debugging kit and plough the stack traces to reverse engineer the I/O subsystem for myself but you know that might be cheating. I've looked at the Enterprise, Datacentre and Storage versions of Windows Server 2000 and I can't see the vital difference from the vanilla flavour I'm looking for. By the way, the semantic splicing of those product names is a bit frown inducing, like "Large", "Big" and "Not Small".

So I am left with two possibilities as to why there are no tuning knobs for Windows Server 2003 volume manager;

  1. What we have here is "Volume Manager Lite" and storage seriouzos must upgrade to er...Veritas Volume Manager, maybe.
  2. This volume manager is so advanced, so weighed down by I/O pattern discovery heuristics and self tuning algorithms that it just does not need tuning. To meddle with this piece of software would be like taking a pasting brush to the Sistine Chapel.

We shall see. Talk to you later.

About

dom

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today