Wednesday Mar 05, 2014

Random or Sequential I/O? How to find out with DTrace

One of the perennial performance or sizing questions for a workload is how much I/O it generates. This can be broken down several ways:

  • Network versus disk
  • Read versus write
  • Random versus sequential (for the disk component)

Many of the metrics have always been easy to quantify. There are standard operating system tools to measure disk I/O, network packets and bytes, etc. The question of how much random versus sequential I/O is much harder to answer, but it can be an important one if your storage is a potential limiter of performance and that storage has a significant amount of "conventional" disk - i.e. spindles of rotating rust.

Sequential I/O on conventional disks can generally be served at a higher throughput because conventional disks can avoid almost all rotational delay and seek time penalties. Random I/O will always incur these penalties, at a varying rate.

So if you need to decompose your workload enough to understand how much Random versus Sequential I/O there is, what do you do? You may be able to discuss it with the application developer to get their take on how the application behaves, what it stores, and how and when it needs to fetch data into memory or write or update it on disk. This information (or access to the developer) is not always available though.

What about if I can get inside the application and measure the I/O as it happens? That is a task for DTrace. I have developed a script that tallies up all I/O for all filesystems and can tell you how much of the I/O is Sequential (a read or write at a location in a file is then followed by a read or write at the next location for that file) or Random.

Here is some example output from the script, where I first ran a Sequential filebench test:

Sample interval: 120 seconds
End time: 2011 Mar 23 10:42:20

FYI: Page I/O on VREG vnodes after read/write:
  fop_read      fop_getpage      2

/u                         rd/s    wr/s   rdKB/s   wrKB/s   acc/look:       0
             sequential 2616.20 0.62501  2676408 0.718235    readdir:       0
                 random 2.65836 0.20834 2713.623 0.641998  geta/seta:       0
                   page 0.00833       0 0.033334        0      cr/rm:       0
                  TOTAL 2618.86 0.83334  2679122 1.360233      other: 2.86669

then I ran a random filebench test:

Sample interval: 60 seconds
End time: 2011 Mar 22 12:09:28

FYI: Page I/O on VREG vnodes after read/write:
  fop_write      fop_putpage      1
  fop_read       fop_getpage      5
  fop_write      fop_getpage     18

/u                         rd/s    wr/s   rdKB/s   wrKB/s   acc/look: 2.28871
             sequential 157.036 161.446 1255.552 1118.578    readdir:       0
                 random 20113.5 17119.0 160860.2 136919.0  geta/seta:       0
                   page       0 4.27672        0 17.10687      cr/rm: 0.05012
                  TOTAL 20270.6 17284.7 162115.7 138054.7      other: 5.26237

As you can see the script outputs a table breaking down reads and writes by operations and by KB/s, but also by "sequential", "random" and "page". There are also some totals, plus some statistics on a few other VFS operations.

What does "page" mean? Well, this script uses Solaris' virtual filesystem interface to measure filesystem I/O. Through this interface we can see conventional read and write operations, however there are also "page" operations that can happen which do not necessarily map to a read or write from an application. Because of this uncertainty I decided to list the I/O via these page operations separately. They will generally be small for an application that uses conventional I/O calls, but may be large if an application for is using the mmap() interface for example.

For the purposes of this blog entry I have simplified the output of the script. It normally outputs a table for every filesystem that has I/O during its run. The script could easily be modified to select a desired filesystem. It could also be modified to select I/O related to a specific user, process, etc.

References

Monday Feb 24, 2014

nicstat update - version 1.95

Another minor nicstat release is now available.

Changes for Version 1.95, January 2014

Common

  • Added "-U" option, to display separate read and write utilization.
  • Simplified display code regarding "-M" option.

Solaris

  • Fixed fetch64() to check type of kstats
  • Fixed memory leak in update_nicdata_list()

Availability

nicstat source and binaries are available from sourceforge.

History

For more history on nicstat and examples of use, see my earlier entry

Monday Nov 12, 2012

How to Get Vim to do Filename Completion Even When You are Root

From the Obscure Unix Admin Tip of the Day section...

If you occasionally edit files as root (I never do, I always use pfexec, wink wink), then you may have noticed that the vim (Vi Improved) editor that normally does filename completion via the <Tab> key now gives you something like:

:e /etc/mo^I

when you try to open up /etc/motd with a little less typing

So, there are at least five solutions to this:

  • Use <Ctrl>-E instead of <Tab>
  • Use the "-N" flag when you start Vim
  • :set wildchar=<Tab> (Enter those 5 characters, not an actual Tab)
  • Use pfexec vim filename
  • In Solaris 11.1 or later, use pfedit filename

The reason for this? It seems that when you are root, Vim sets it's "compatible" flag, which makes it behave more like its ancestor vi. In turn this makes Vim set 'wildchar' to <Ctrl>-E.

For more info, read the section you get when you enter :help cmdline-completion

Thanks to Darren for the tip on pfexec

Friday Aug 24, 2012

Mercurial says "nothing changed", but it did. Sometimes my software is too clever.

It seems I have found a "bug" in Mercurial. It takes a shortcut when checking for differences in tracked files. If the file's size and modification time are unchanged, it assumes its contents are unchanged:

$ hg init .
$ cp -p .sccs2hg/2005-06-05_00\:00\:00\,nicstat.c nicstat.c
$ ls -ogE nicstat.c
-rw-r--r--   1   14722 2012-08-24 11:22:48.819451726 -0700 nicstat.c
$ hg add nicstat.c
$ hg commit -m "added nicstat.c"
$ cp -p .sccs2hg/2005-07-02_00\:00\:00\,nicstat.c nicstat.c
$ ls -ogE nicstat.c
-rw-r--r--   1   14722 2012-08-24 11:22:48.819451726 -0700 nicstat.c
$ hg diff
$ hg commit
nothing changed
$ touch nicstat.c
$ hg diff
diff -r b49cf59d431d nicstat.c
--- a/nicstat.c	Fri Aug 24 11:21:27 2012 -0700
+++ b/nicstat.c	Fri Aug 24 11:22:50 2012 -0700
@@ -2,7 +2,7 @@
  * nicstat - print network traffic, Kb/s read and written. Solaris 8+.
  *	"netstat -i" only gives a packet count, this program gives Kbytes.
  *
- * 05-Jun-2005, ver 0.81  (check for new versions, http://www.brendangregg.com)
+ * 02-Jul-2005, ver 0.90  (check for new versions, http://www.brendangregg.com)
  *

[...]

Now, before you agree or disagree with me on whether this is a bug, I will also say that I believe it is a feature. Yes, I feel it is an acceptable shortcut because in "real" situations an edit to a file will change the modification time by at least one second (the resolution that hg diff or hg commit is looking for). The benefit of the shortcut is greatly improved performance of operations like "hg diff" and "hg status", particularly where your repository contains a lot of files.

Why did I have no change in modification time? Well, my source file was generated by a script that I have written to convert SCCS change history to Mercurial commits. If my script can generate two revisions of a file within a second, and the files are the same size, then I run afoul of this shortcut.

Solution - I will just change my script to apply the modification time from the SCCS history to the file prior to commit. A "touch -t

Thursday Jul 28, 2011

nicstat update - version 1.90

Yes! A new version is now available with some long-awaited features. Many thanks to those who suggested improvements and helped with testing.

Changes for Version 1.90, April 2011

Common

  • nicstat.sh script, to provide for automated multi-platform deployment. See the Makefile's for details.
  • Added "-x" flag, to display extended statistics for each interface.
  • Added "-t" and "-u" flags, to include TCP and UDP (respectively) statistics. These come from tcp:0:tcpstat and udp:0:udpstat on Solaris, or from /proc/net/snmp and /proc/net/netstat on Linux.
  • Added "-a" flag, which equates to "-tux".
  • Added "-l" flag, which lists interfaces and their configuration.
  • Added "-v" flag, which displays nicstat version.

Solaris

  • Added use of libdladm.so:dladm_walk_datalink_id() to get list of interfaces. This is better than SIOCGLIFCONF, as it includes interfaces given exclusively to a zone.
    NOTE: this library/routine can be (by default is) linked in to nicstat in "lazy" mode, meaning that a Solaris 11 binary built with knowledge of the routine will also run on Solaris 10 without failing when the routine or library is not found - in this case nicstat will fall back to the SIOGLIFCONF method.
  • Added search of kstat "link_state" statistics as a third method for finding active network interfaces. See the man page for details.

Linux

  • Added support for SIOCETHTOOL ioctl, so that nicstat can look up interface speed/duplex (i.e. "-S" flag not necessarily needed any longer).
  • Removed need for LLONG_MAX, improving Linux portability.

Availability

nicstat source and binaries are available from sourceforge.

History

For more history on nicstat, see my earlier entry

Wednesday Dec 08, 2010

Which package delivers the file/directory I can not find?

Back in the day (ahem, Solaris 10 and earlier...) you could use a command like the following to figure out which package was responsible for; e.g.; "/usr/include/math.h", or any other "include/math.h", just in case it is somewhere less obvious:

mashie$ fgrep include/math.h /var/sadm/install/contents
/usr/include/math.h f none 0644 root bin 10514 7356 1249334889 SUNWlibm

In other words, /var/sadm/install/contents was a flat-file index of every file, directory, link, etc. installed on the system. The last field is the package name - "SUNWlibm" in this case.

Things are a little more tricky with the new packaging system in Solaris 11 Express. Here is the equivalent "pkg contents" command:

mashie$ pkg contents -o pkg.name,path -a path='\*/include/math.h'
PKG.NAME                        PATH
system/library/math/header-math usr/include/math.h

About

Tim Cook's Weblog The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today