A DTrace Provider for NFS

As a kernel developer working on NFSv4, I love DTrace. The fbt and syscall providers are my constant companions.

But last June, at the NFSv4 Bakeathon, I had a more difficult problem -- an infinite loop involving volatile filehandles. The problem was reproducible (good), but it took a couple of minutes running a test suite to get to that point (bad). Thousands of over-the-wire operations occurred before the more interesting things began to happen.

Using snoop, which "knows" the NFSv4 protocol, I knew what Solaris NFSv4 was doing, but I didn't know why it was doing it. DTrace could tell me the "why" part, but I couldn't think of a script that would get me just the right information, without drowning me in uninteresting information.

Sleep deprived, I began whining about how DTrace should have a new provider; one than "knows" the NFSv4 protocol, and lets you write protocol related events into your scripts. A provider that lets you do things like grab filehandles, and stuff them into variables, to be used later in conditionals.

Upon getting home, I began writing such a provider. In fact, I wrote one that could solve the problem discussed above. As it grew, there were some parts of the code that were getting messy, so now I'm in the midst of refactoring the provider.

A New Provider

Here is how the NFSv4 provider is starting to look. As usual for DTrace probes, we have:


Provider is either "nfs" or "nfs-server". Plain old "nfs" is the NFS client -- the thing that lets you mount remote file systems. Yes, this means that technically there are really two providers, but they will be nearly the same structure. Learning one provider will be sufficient to learn both.

Module is either "nfs" or "nfssrv", again for client and server. I would recommend just leaving this part blank, since it doesn't buy you any more than the provider slot. If you do not leave it blank, make sure that provider and module match. If they don't match, you won't get any probes. For example, "nfs:nfssrv:<something>:<something>" will not match any probes.

Function is an operation or an attribute. More on that below. :-)

Name is either "start" or "done". For a probe on an client operation, "start" would fire when the client sent the operation over-the-wire to the server, and "done" would fire when it received its response from the server. For the server, "start" would fire when the server received a request, and "done" would fire when the server answered the request. Callback functions, where the server makes a request of the client, are just what you would expect -- client "start" fires when the client first receives a request from the server, etc.

The Probes

As mentioned above, the function slot is an operation or an attribute. Operations are of the form "op-<operation-name>", where <operation-name> is an operation defined in the NFSv4 protocol. Examples:

nfs::op-read:start /\* client did a read operation \*/

nfs-server::op-read:start /\* server got a read request \*/

nfs::op-compound:done /\* client's compound op finished \*/

For attribute probes, we use "attr-<attribute-name>", where <attribute-name> is a type defined in the NFSv4 protocol. It can be either an argument to an operation (examples:)

nfs::attr-seqid4:start /\* client sent a sequence ID with some op\*/

nfs::attr-filehandle:done /\* client received a filehandle \*/

or an attribute of a file. Examples:

nfs::attr-owner:start /\* client is sending an owner file attribute \*/

nfs::attr-filehandle:done /\* client received a filehandle \*/

Above, notice that attr-filehandle makes sense as an argument to an operation, or an attribute of a file. The nice thing about the attr-\* probes is that you can trace these attributes going over the wire, without caring about what operation is sending them, or how it's being sent.

Arguments to the Probes: args[0] and args[1]

For all probes, there are two arguments. args[0] is the same for all probes: it is a structure that holds things such as the tag and transaction id (xid) of the operation. It will likely hold the network address of the machine that it's talking with; more things may be added in the future.

xidTransaction ID
tagDescriptive tag for the request (see section 14.2 of the NFSv4 spec).
addrAddress of the remote host (in a format yet to be determined)
more may be added; stay tuned.

The second argument, args[1], is different for every probe. For op-\* probes, it's the structure that is being sent or received. For attr-\* probes, it's just the attribute itself. Examples:

ProbeType for args[1]
Types for attributes may tend to be more simple types, but this is still to be decided. Here are some possibilities:

Any working examples yet?

The provider is still being implemented, and it doesn't yet handle very many of the details of the NFSv4 protocol. But the framework is there, and it does handle "compound", which is the fundamental over-the-wire operation in NFSv4.

As its name implies, a compound operation can do more than one thing. For example, it can create a new file and get the new file's attributes, all in one over-the-wire operation. As mentioned before, compounds have a field called a "tag", which is a descriptive string attached that very briefly describes the purpose of the request. The tag for our hypothetical create-file-and-get-its-attributes operation might simply be "create".

Here is a simple D script that uses the op-compound probe to show which over-the-wire requests the client spends the most time waiting upon. The requests are collated by the tag. You could use such a script to help get an idea of what compounds are taking the most time, and maybe start thinking about where to do further performance analysis.

Following the example script is example output. While running this example script, I used "tar" to copy a bunch of files within an NFS mounted directory, and then used "rm -rf" to delete them. Don't treat this as a serious benchmark or anything -- it's just to give an idea of what's easily possible with the new provider.

#! /usr/sbin/dtrace -s

#pragma D option quiet

 \* Record the time when a request is sent to the server.

    cstart[args[0]->xid] = timestamp;

 \* We received a response from the server.  Below, all times are converted
 \* from nanoseconds to microseconds (dividing by 1000); this was more readable
 \* in my particular setup.

/cstart[args[0]->xid] != 0/
    /\* Stuff the descriptive "tag" into a convenience variable \*/
    this->tag = stringof(args[0]->tag);

    /\* Compute total time spent processing a given tag \*/
    @ttime[this->tag] =
        sum((timestamp - cstart[args[0]->xid]) / 1000);

    /\* Free up the cstart[] slot \*/
    cstart[args[0]->xid] = 0;

 \* We're done, probably via the user hitting \^C.  Print the results.
    printf("\\ntotal time spent in calls\\n");


total time spent in calls

  renew                                                           276
  commit                                                        11054
  link                                                          21647
  rmdir                                                        367802
  mkdir                                                        422512
  access                                                       599114
  readdir                                                      753105
  lookup valid                                                 822106
  lookup                                                       862495
  read                                                        1026893
  close                                                       1095138
  getattr                                                     1841042
  write                                                       6674532
  remove                                                      6802618
  setattr                                                     7101811
  open                                                        7539189

Technorati tags:

Post a Comment:
Comments are closed for this entry.



« August 2016