Wednesday Dec 28, 2005

Getting to grips with NTFS - Part II

I have got fed up with the funeral march that is talking to the KVM switch that controls the Windows 2003 Server I am doing testing on via HTTP - see last post. I have installed Microsoft's Windows Services for Unix which includes the Unix shell utilities, NFS, pthreads (more of which later), various other bits and pieces but importantly for me, a Telnet daemon. I did scratch my head at the name "Windows Services for Unix". Shouldn't that be "Unix Services for Windows"? No matter, I have my backslash and pipe symbol back - although a DOS shell via telnet is very strange place to be for a bear of little brain like me. There is ps(1) and vi(1) and Lord knows what else I haven't discovered yet. I take my hat off to these fellows. They have done a good job. One might almost consider...No. Of course not.

Another tool I installed along the way was BGInfo from Sysinternals. If you tend to weave your way among a lot of MS boxes, having some basic information on the desktop is a great help.

Anyway, as promised, that sample NTFS filesystem creation session, brought to you via the technical marvel we know as Telnet:

\*===============================================================
Welcome to Microsoft Telnet Server.
\*===============================================================

C:\\home\\dominika>diskpart

Microsoft DiskPart version 5.2.3790.1830
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: VA64-2OC

DISKPART> list volume

  Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
  ----------  ---  -----------  -----  ----------  -------  ---------  --------
  Volume 0     S                FAT32  Stripe      1800 MB  Healthy
  Volume 1     D                       DVD-ROM         0 B  Healthy
  Volume 2     C                NTFS   Partition     24 GB  Healthy    System

DISKPART> select volume 0
Volume 0 is the selected volume.

Microsoft Gripe O' The Day. That "Volume 0" is the one I am testing on - incrementing the underlying number of spindles and watching the performance deltas, especially between FAT32 and NTFS. The gotcha is that if I delete it and recreate it, it might not be Volume 0 anymore. It might be Volume 2. Yesterday Drive C was Volume 0 and Drive S was Volume 2 until I rebooted. Then it changed again. As the volume is the object you have to manipulate, this makes it devilishly difficult to script this stuff. The rest is pretty straightforward;

DISKPART> delete volume

DiskPart successfully deleted the volume.
DISKPART> list disk

  Disk ###  Status      Size     Free     Dyn  Gpt
  --------  ----------  -------  -------  ---  ---
  Disk 0    Online        75 GB    25 GB
  Disk 1    Online        75 GB      0 B
  Disk 2    Online        34 GB    34 GB   \*
  Disk 3    Online        34 GB    34 GB   \*
  Disk 4    Online        34 GB    34 GB   \*
  Disk 5    Online        34 GB    34 GB   \*
  Disk 6    Online        34 GB    34 GB   \*
  Disk 7    Online        34 GB    34 GB   \*

DISKPART> create volume stripe disk=2,3,4,5,6,7 size=300
DiskPart successfully created the volume.

The asterisk in the Dyn column indicates that we are creating volumes spanning "dynamic disks" - an abstraction layer allowing us to spread multiple volumes over arbitrary disks. You can't have a "stripe" of one disk; that should be "simple". The size=300 indicates how much space (Mb) you want on each disk. This (as I noted in my last post) is not the stripe width, over which we have no control.

DISKPART> assign letter=s
DiskPart successfully assigned the drive letter or mount point.

DISKPART> exit
Leaving DiskPart...

C:\\home\\dominika>format s: /fs:fat32 /v:TestFS
The type of the file system is RAW.
The new file system is FAT32.

WARNING, ALL DATA ON NON-REMOVABLE DISK
DRIVE S: WILL BE LOST!
Proceed with Format (Y/N)? y
Verifying 1800M
Initializing the File Allocation Table (FAT)...
Format complete.

1,883,738,112 bytes total disk space.
1,883,734,016 bytes available on disk.

        4,096 bytes in each allocation unit.
      459,896 allocation units available on disk.

           32 bits in each FAT entry.

Volume Serial Number is A07E-7B4E

C:\\home\\dominika> iozone -a -z -i0 -f s:\\IozoneTest -b c:\\t\\f6.wks > c:\\t\\f6.out 2> c:\\t\\f6.err

...and we are up and away. Its worth saying that DISKPART does a lot of stuff asynchronously so its worth drawing breath between activity inside its shell and outside or else it gets in a muddle; particularly so if you are passing commands in via scripts interspersed with other command-line utilities that interact with the I/O subsystem - this can get it very confused as can multiple concurrent DISKPART sessions.

The numbers generated by this little escapade are interesting and not what you might expect. Later.

Tuesday Dec 20, 2005

Getting To Grips With NTFS

Getting to grips with NTFS

This blog entry is for the amusement of Solaris folks. Seasoned enterprise Windows administrators, look away.

Figure 1: The Windows Server 2003 Disk Management GUI - Looking Glass, it ain't!

For reasons best not gone into, I have to do I/O benchmarking on Windows Server 2003. To be fair, the command shell language has grown up a lot since I used to teach it to help-desk unfortunates 10 years ago. Scarily, it has adopted Unix style I/O redirection (e.g "2>&1" and so forth) and flow control making it the bastard son of Kornshell and DOS Batch. Hmm. Variable substitution is still reassuringly hellish though.

Configuring up filesystems is not a million miles from format(1M) and metainit(1M). In order to make it more interesting I am working with a server connected to a KVM switch which exports the Windows screen over HTTP. This has some interesting effects;

  • My mouse pointer and the mouse pointer on the server get further and further from each other over time (presumably due to lost packets on the net). This results in quite a performance - a sort of mouse driven tai-chi in very slow motion.
  • I am working on a laptop running Windows XP. When I went for a coffee break, the server screensaver came on - to log back in I have to give the three fingered salute (ctrl-alt-del). In order to do this without rebooting my laptop I had to walk over to a neighbouring Sunray! It was only later I discovered that the makers of the KVM software have actually thought of that and included menu options for sending that key combination.
  • The backslash character is completely unmapped. I have tried every combination of characters on my keyboard and all possible Alt-nnn combinations. I am reduced to running scripts where I have cut and pasted the hateful character in by hand. Ditto the pipe symbol.

Anyway enough whinging. How is it going? Well. To start with, its worth saying that S2003 comes with a graphical volume manager.(Control Panel -> Administrative Tools -> Computer Management -> Disk Management). This, as the Help menu item will re-assure you, is provided to Microsoft by the VERITAS Software Corporation, about which I have written before. Now you have found the GUI, forget you ever saw it for two reasons:

  • Its hobbled. Put more delicately it "prohibits you from inadvertently performing actions that may result in data loss". I have to confess I have not even bothered to explore what actions these might be. Bring 'em on.
  • As GUIs go, its not very good. This is strange because Veritas are adept at GUIs that have to deal with multiple layers of abstraction. This one is firmly two dimensional. The idea of configuring up filesystems from a couple of hundred spindles using this tool was not a pleasant one. Throw it away and...

Real men in this part of the operating system forest use Diskpart - a command shell which you can use interactively or you can pass it scripts on the command line.This is combined with the Format command, so named to confuse Solaris folk. Diskpart plays the role of format(1M) and metainit(1M). Format plays the role of newfs(1M) or mkfs(1M). Do keep up!

Interaction

I was going to list a quick sample of the filesystem creation process but the KVM web server makes it all too painful to gather the data.. Instead, I feel I probably haven't irritated Jonathon enough by 'fessing up to using XP (my two years at Dell with a Sun Ultra 1 Creator on my desk for similar purposes caused much management wailing and gnashing of teeth) so I'm going to whole-heartedly recommend Bill Stanecks' Windows Command Line Pocket Book from Microsoft Press. If you were brought up in a Unix shell, starting here will cut through a ton of larger books.

The Striping Conundrum

Anyone starting I/O tuning wants to match the modal read and write request of their application to the capabilities of the underlying hardware. The mechanism for achieving this is the volume manager (the functionality of which may or may not be part of the file system but never mind). The mechanics of QFS for example are set out here and its counterpart for Solaris Volume Manager is here. Similar information for Veritas Volume Manager is in this large PDF manual. The point I am making is that anyone with the vaguest interest in tuning their I/O subsystem ends up in this section of the manual for the product of choice. With Windows Server 2003, it was quite hard to find anything on this topic. My web search revealed many pages with bland reassurances similar to;

"With a striped volume, data is divided into blocks and spread in a fixed order among all the disks in the array, similar to spanned volumes. Striping writes files across all disks so that data is added to all disks at the same rate."

This really tells me nothing, or rather it begs more questions than it answers: How big are these blocks? What is the order (round-robin, parallel, ...). How do I change these things? Tantalising isn't it. More digging revealed this gnomic utterance;

"For Windows Server 2003, the size of each stripe is 64 kilobytes (KB)."

Thats it. End of story: I expected more, I really did - especially as I mentioned, its got Veritas written on the label. If I'm missing something (another manual? strange Registry witchcraft?) please let me know (I'm Dominic Kay, I work at Sun; take a wild guess at the email address). I know I could download Mark Russinovich and Bryce Cogswells debugging kit and plough the stack traces to reverse engineer the I/O subsystem for myself but you know that might be cheating. I've looked at the Enterprise, Datacentre and Storage versions of Windows Server 2000 and I can't see the vital difference from the vanilla flavour I'm looking for. By the way, the semantic splicing of those product names is a bit frown inducing, like "Large", "Big" and "Not Small".

So I am left with two possibilities as to why there are no tuning knobs for Windows Server 2003 volume manager;

  1. What we have here is "Volume Manager Lite" and storage seriouzos must upgrade to er...Veritas Volume Manager, maybe.
  2. This volume manager is so advanced, so weighed down by I/O pattern discovery heuristics and self tuning algorithms that it just does not need tuning. To meddle with this piece of software would be like taking a pasting brush to the Sistine Chapel.

We shall see. Talk to you later.

Friday Sep 09, 2005

Letter to The Chef

Thanks for your kind offer of free alcohol in one of London's top hotels via SMS. Thats not an invitation I get often, especially from the head chef. Unfortunately it arrived just as I had turned in for the night 60 miles down the road on the coast. I could have used a pint though. How long has it been now?

My cellphone didn't wake the baby. Did I tell you we had another? Thats 3 now. No more. Also we have a spanking new kitchen so I can get on with my baking without swearing all the time. Did I tell you I'm a bread baker? Probably not. I'm thinking about doing it for a living but am too scared to open a boulangerie even though I have the capital to do so. I was inspired by The Handmade Loaf What else is new in Kays' life?

\* My stepfather died at the end of 2003; My dad died at the end of last year; My mum died 4 months ago. I think my family is done with funerals for a while but we have several house jokes which the children are required to recite, one of which is

Q: What makes God laugh?
A: People with plans

Talking of which....

\* I lost my job last week after 7 years. My last day at Sun will be 14 October. Guess I'll need a new email address and blog site, huh. It was nothing personal - I was just amongst the number and so am looking for work, principly in the areas of system performance, capacity planning, benchmarking, system and network architectures, and secondarily in technical team leadership and project management.

My resume is here though for reasons of brevity I have left out all the work I did on fishing boats, in French restaurant kitchens and fighting Argentina (for reasons that are still not clear to me, 13 years later). Some papers I have written are here and here. I'm sure you will offer me a start in your kitchen (as Plongeur de Maison, naturally) but hopefully the world of IT has not yet had its fill of me.

\* We are going to buy a house in France ( with a BIG old stone bread oven ). There is symmetry here: My Mum loved the place and her old friends first talked of selling it to me as we drove from the crematorium. Another coincidence is that it is in the town of Descartes and I am hopeless at maths.

\* We now have 3 Mercedes with a combined mileage of over half a million miles. The secret is frequent oil changes. Being heir to an oil fortune would also help with the bills. If you hear people saying they don't make them like they used to, believe it. They're still a whole lot more robust than any other brand though.

\* Out of the dozen vines in my vinyard in the garden, 4 have fruited. This is year 3 for them so thats not so good. Fortunately the proposed purchase in France has a big field out back so I can desist with this fools' errand and grow them in the climate God provided for the purpose.

Keep in touch. Must run now - I have revision to do as one of my prospective employers expects me to have more than a passing knowledge of TCP internals (never mind).

Dom.

Monday Aug 08, 2005

Queueing Costs Nothing

My boss asked me if there was any training I needed. This is a good sign. Unless of course he is looking for something to pad out my severance package such as a course in bricklaying or preparation for the Microsoft Certified Systems Engineer exams. But, hey, the world will always need bricklayers, right? Anyway I replied drily that I get most of my training from Amazon these days. Which is true but the costs are beginning to match those of training courses. Supply and demand dictates that if you want obscure books on systems modelling you must pay the price? Not necessarily so. Here are some expensive titles, for free.

If you are new to the area, I would say that googling for "Markov Chains" is probably not the best way to start, particularly, if like me, you took a fairly relaxed attitude to your mathematics education. Instead go and read Neil Gunther's quite approachable series of articles at the Teamquest site.

Quantitative System Performance Computer System Analysis Using Queueing Network Models dates back to 1984. No doubt if it was still in print it would cost the far side of sixty bucks. Its still very relevant, cited frequently, and is yours for the patience to download and unzip it. A gentle introduction to the joys of mean value analysis: I commend it to you.

If on the other hand you are slightly better mathematically equipped you might want to jump straight in and get Introduction to Queueing Theory (pdf) by Bob Cooper. This and the sundry others are there for the taking at Myron Hlynka's listings.

By now you may well want to play: WinPEPSY (for Windows-enabled readers) is an implentation of PEPSY-QNS it's a very useful tool for graphically constructing and visualising the parameters for systems of network queues. Here is a screenshot to whet your appetite.

Enjoy.

Tuesday Jul 26, 2005

On Blogs and Bloggers


I am gratified to learn that it isn't just me who is banging his head against the wall with the blogging infrastructure, Roller.

Phil Harman, who few would describe as a technical slouch, is also vexed. Ditto Richard McDougall. The concept of blogging is brilliant and the way it's been executed on in Sun is marvellous. The gripe is that Roller is a web application. Its a fine effort but it's simply not finished. A keen knowledge of HTML and CSS is required to make forward progress on any but the most minor formatting issues. To which I hear you reply "If you can't even master a trivial markup language and meta-language in order to create your glorified post-it notes, why on earth did they give you a job?" Quite right. Given that I've told my children they can only play computer games if they construct them with the editor and assembler I've supplied, this is hypocrisy of the worst sort.

Another observation is that when initialising a blogspace you get a number of links to other peoples' blogs "for free". These are, I'm told, the great and the good of the Roller project and so forth and I'm advised to retain them as a mark of respect. No. Nor will I link to Jonathan: Every one else does - he really doesn't need me. Instead I shall save my sycophancy for a select few (several of whom I've never physically met - the joys of iWork!). I will only entertain a few links to other bloggers and the criteria are

  • stringent.
  • completely subject to whim.
  • only discernible by reference to worked examples.

but broadly, they have to be people who have changed the way I think.

Dave Levy

  • Dave convinced me that if I did not blog, my career would never amount to a hill of beans. Amazingly, since I started, I have become as rich as Croesus and am besieged with offers to join fascinating projects. Due to his blogging, no-one ever confuses him with anyone in the Knesset anymore.
  • He makes me work very hard at being a better technologist. He fails because I fail to realise it involves being less "techie". But that's not the point; its good to work with people who stretch you intellectually and think you are worth bothering to argue with. Here is a picture of him taken in the basement of our building, looking on as his henchmen torture me until I confess the three meanings of the keyword static in C or something; I blacked out at that point. Smile not shown.
  • .

Richard McDougall

  • He's Australian and Australians make Brits seethe which is a good thing. These people not only make one of my favourite wines, they know how to have fun - because centuries ago the UK authorities deported anyone having more fun than they were and now the chickens have come home to roost and beat them at cricket. Their soap opera characters have clear skin and laugh off their improbable relationships: UK soap characters always look ill and spend all their time arguing and rueing their social ineptitude. [Soaps from both countries are content-free though: your time is better spent learning HTML and CSS, and only then, possibly, DTrace.]
  • He's a motivating force behind Filebench, more of which later.
  • He was kind enough to entertain (even encourage) the dozens and dozens of pages of fine grained critique of pre-release drafts of the book My wife characterised this "contribution" as "anal retentive carping criticism and hankering for a style of grammar and syntax dating back to Dickens". But as a result, I never had to read the book when it came out. Result!
  • A colleague described my banner graphic as "making me look like a gangster" but I was able to retort "If you think I look scary, take a peek at Richard..." I'm not saying this man is the Travis Bickell of the Operating Systems world but would you argue with him? Are you Luco Brazzi?
  • His car. What better evidence of certifiable insanity: Lots of Trouble; Usually Serious.

Jon Haslam

  • "There are no questions too stupid to ask; merely some too stupid to answer." This is my view. Fortunately it is not Jons' as he has to fend off a lot of these from me, most of which boil down to me being too lazy to re-read the segmap code on a daily basis (There, I've said it; now I'll never get into PAE).
  • Jon has triplets; small ones. I was blessed with children that arrived at evenly spaced intervals; the amount of sleep had in the Kay household far, far outstrips the Haslam quota. Somehow he not only manages to stay up later than me, but do creative stuff during that time as opposed to...eh...see Soap Operas (above). Whenever I feel like powering off the laptop and sloping off to my pit, I think of Jon.
  • I learned from Jon the importance of presentations as performance art. Dtrace is not in and of itself, a highly amusing subject. And yet....

Adrian Cockroft

  • Adrian and Jim, below, and Richard, above, used to write articles for a site called SunWebOnline, which I think has disappeared. At the time they were the only source of information on Solaris internals available to jobbing sysadmins like me. I liked them (and the whitepapers) so much I joined the company. Ironically, by the time I got there, Richard Pettit (co-author with Adrian of the SE Toolkit) had already left and Adrian subsequently departed, last seen for sale on eBay. His legacy is of course the Porshe Book and the Capacity Planning Blueprint
  • A shared interest in Performance and open source tools for measuring and modelling it.

Jim Mauro

  • For the reasons above: he wrote stuff that made my life before Sun more interesting. Had he not written it, I would not have used it and my employers at that time would not have been so wildly successful.
  • Co-author of the book.
  • He's the only person who has ever delivered a definition of "badabing" in an accent I could penetrate; i.e he wrote it down.

Phil Harman

  • His exactitude: (From an email thread, long discussion of direct I/O elided...)If an application were to decide NOT to turn on O_SYNC or O_DSYNC because it had asked of Direct I/O, it may assume that it is getting synchronous writes when it isn't if someone else turns off Direct I/O. That's all. It's very unlikely. But the question was about safety. I dreamt up the only scenario I could think of where safety was an issue. I can't imagine that Oracle would make this assumption, but I'm not the man with a business running on a 72 core system (I assume it's something bigger than an icecream parlour). And no, the customer involved was not Ben and Jerry.
  • Walking past my desk one day, he looked at what I was reading and then made me entirely rethink my methodology in only two words: "Hmm, Gunther. Quaint" with no further explanation. At all. The turmoil that resulted has filled a whole bookshelf at home. Bizzare - but in a good way.

    So thats it. No-one else. Not never. And I won't even link to these until the libel proceedings have subsided.


  • Thursday Jul 21, 2005

    Visualising Performance


    Visualising Performance

    There are several things that interest me. Filesystem and datapath software design is one. Computer performance is another; particularly datapath performance of course but also the whole stack. Open Source software for helping in improving performance; load generators, probes and monitors, mathematical and graphical software for doing such things as statistical manipulation, implementing queuing theory and simulation; that sort of thing. I'm not alone here. Richard Cockroft, author of perhaps the primary source on Solaris performance has blogged on this topic.

    What do I mean by visualising performance? Well, look at the following table, extracted from the Lustre Wiki - data gleaned from a netperf benchmark of 10 gigabit ethernet interfaces, increasing the payload size and the size of the socket buffer:

    MBytes/s

    Socket Buffer Size

    Send Size

    128K

    256K

    512K

    1M

    2M

    4M

    8M

    16M

    8K

    212.79

    260.79

    273.72

    314.31

    362.51

    349.24

    358.81

    376.20

    16K

    218.68

    259.10

    273.53

    314.24

    362.34

    348.82

    358.39

    376.09

    32K

    213.63

    260.07

    273.29

    329.90

    362.17

    349.00

    358.63

    376.01

    64K

    221.17

    263.98

    273.31

    316.10

    361.51

    348.74

    358.11

    375.91

    128K

    224.50

    266.42

    273.96

    313.34

    362.08

    348.88

    358.20

    376.39

    256K

    221.97

    260.96

    275.27

    290.05

    361.51

    348.68

    357.97

    376.48

    512K

    222.43

    265.68

    274.28

    289.10

    361.28

    348.95

    358.14

    376.37

    1M

    226.24

    266.02

    275.66

    295.67

    361.64

    348.70

    357.93

    376.71

    This is a common enough scenario. There is one dependent variable; the throughput of the connection. There are two independent variables - the size of the socket buffer and the size of the request. I had to look at that table for quite a while before I could see the result - the relationship. This is very common in benchmarking. Often, only two causal factors would be considered to be on the light side; the mount parameters for a filesystem can run to a dozen or more.

    OK, so this example is not one that is going to set the world alight but its in the public domain, which helps. I have to get drunk with people who, in terms of scientific visualisation, have bigger fish to fry. But these days we (Sun) have bigger fish on the chopping board - especially petabyte storage and grids; both of the compute and storage varieties. You canot build these things in the lab on a whim; you have to model and modelling means visualisation.

    I found this graph more intuitive:

    require(lattice)
    
    g_data <- read.table
       (fileName <- choose.files("\*.csv"), header=T)
    
    print(wireframe(g_data$mbs ~ g_data$soc_buf 
       \* g_data$send_kb,
    	zlab="Mb/s" ,
    	ylab="Send size (Kb)" ,
    	xlab="Socket buffer size (Kb)" ,
    	drape = TRUE, 
    	colorkey = TRUE
    	) )
    

    The code to the above is for the The R Package, a free software environment for statistical computing and graphics, more of which below. I think the key message is "This is not a lot of code" (to 'fess up, I did have to deprocess the pretty printed table back to CSV). So this more or less tells us that one of the variables has little effect. But we can do better than this:

    g_data <- read.table(fileName <- choose.files("\*.csv"), header=T)
    print(splom( ~ g_data)
    

    This gives us a scatterplot matrix. In two lines of code we can compare the relationship between every variable in the test and the relationships leap from the page. In our case there are only three dimensions but trellis graphics (in S-Plus, the commercial version) or lattice graphics (in R) allow us several graphical methods to visually explore our data.

    What does it tell us? That after a certain point, increasing the size of the buffer provides no further boost in throughput. This is important as kernel memory is a finite resource.

    Then its just a matter of drilling down for the "management summary" (But 'fessing up again, I am daintily sidestepping the thorny topic of non-linear regression analysis. Another day.):

    xyplot(mbs ~ soc_buf_kb , 
    	aspect = "xy", 
    	ylab = "Mb/s" , 
    	xlab = "Socket buffer size (Kb)")
    

    So then. The my elevator pitch for R.

    • Its free (as in speech, not beer, yadda yadda). There is good community around it.
    • It has vector maths and matrices built in so no more loops, nested loops, nested nested...[repeat 'till fade].
    • All the regression, correlation, smoothing, modelling mathmatical grind and all the presentation graphics have already been attended to.
    • It interfaces to Java (and C, and [your language shared library of choice here]).
    • It is object orientated which is handy for someone who wants to represent e.g a storage array or compute node both as a piece of graphics (icon, connectors, etc) and as a chunk of maths.
    • Its home from home for those that like a command line environment. Intractable (write-only) code to rival Perl can be written if one leans to the beard-stroking, sandal wearing edge of the technology community.
    • It incorporates the TCL/Tk libraries so you can write fully formed standalone GUI applications in it.

    When all is said and done, its really good for performance & capacity planning "exploration"; later on I'll measure an elephant for you in pretty quick time in R.

    So endeth my first blog; respect and gratitude to David Levy for requisite motivational arse kicking and Simon Dachtler for finding time to produce my banner graphic while still keeping the Far-East manufacturing economy ticking over.


    About

    dom

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today