X

HTTP/WebDAV Analytics

Guest Author
Mike calls Analytics the killer app of the 7000 series NAS
appliances
. Indeed, this feature enables administrators to quickly understand what's happening on
their systems in unprecedented depth. Most of the interesting Analytics data comes from DTrace
providers built into Solaris. For example, the iSCSI data are gathered
by the existing iSCSI provider, which allows users to drill down on iSCSI operations by
client. We've got analogous providers for NFS and CIFS, too, which incorporate the richer information we have for those file-level protocols (including file name, user name, etc.).

We created a corresponding provider for HTTP in the form of a pluggable Apache module called
mod_dtrace. mod_dtrace hooks into the beginning and end of each request and
gathers typical log information, including local and remote IP addresses, the
HTTP request method, URI, user, user agent, bytes read and written, and the HTTP
response code. Since we have two probes, we also have latency information for
each request. We could, of course, collect other data as long as it's readily
available when we fire the probes.

The upshot of all this is that you can observe HTTP traffic in our Analytics
screen
, and drill down in all the ways you might hope (click image for larger size):


Caveat user

One thing to keep in mind when analyzing HTTP data is that we're tracking individual requests, not lower level I/O operations. With NFS, for example, each operation might be a read
of some part of the file. If you read a whole file, you'll see a bunch of
operations, each one reading a chunk of the file. With HTTP, there's just one request, so you'll
only see a data point when that request starts or finishes, no matter how big the file is. If one client is
downloading a 2GB file, you won't see it until they're done (and the latency might be very high, but that's not necessarily indicative of poor performance).

This is a result of the way the protocol works
(or, more precisely, the way it's used). While NFS is defined in terms
of small filesystem operations, HTTP is defined in terms of requests,
which may be arbitrarily large (depending on the limits of the
hardware). One could imagine a world in which an HTTP client that's
implementing a filesystem (like the Windows
mini-redirector
) makes smaller requests using HTTP
Range headers
. This would look more like the NFS case - there would
be requests for ranges of files corresponding to the sections of files
that were being read. (This could have serious consequences for
performance, of course.) But as things are now, users must understand
the nature of protocol-level instrumentation when drawing conclusions based on HTTP Analytics graphs.

Implementation

For the morbidly curious, mod_dtrace is actually a fairly straightforward USDT provider, consisting of the following components:

  • http.d defines http_reqinfo_t, the stable structure

    used as an argument to probes (in D scripts). This file also defines

    translators to map between httpproto_t, the structure passed to

    the DTrace probe macro (by the actual code that fires probes in

    mod_dtrace.c), and the pseudo-standard conninfo_t and

    aforementioned http_reqinfo_t. This file is analogous to any

    of the files shipped in /usr/lib/dtrace on a stock OpenSolaris system.

  • http_provider_impl.h defines httpproto_t, the structure that mod_dtrace

    passes into the probes.

    This structure contains enough information for the aforementioned translators to fill in both the

    conninfo_t and http_reqinfo_t.






  • http_provider.d defines the provider's probes:











    provider http {

    probe request__start(httpproto_t \*p) :

    (conninfo_t \*p, http_reqinfo_t \*p);

    probe request__done(httpproto_t \*p) :

    (conninfo_t \*p, http_reqinfo_t \*p);

    };
  • mod_dtrace.c implements the provider itself. We hook into

    Apache's existing post_read_request and

    log_transaction hooks to fire the probes (if they are enabled).

    The only tricky bit here is counting bytes, since Apache doesn't

    normally keep that information around. We use an input filter to count

    bytes read, and we override mod_logio's optional function to

    count bytes written. This is basically the same approach that

    mod_logio

    uses, though is admittedly pretty nasty.

We hope this will shed some light on performance problems in actual customer environments. If you're interested in using HTTP/WebDAV on the NAS appliance, check out my recent post on our support for system users.

Join the discussion

Comments ( 2 )
  • Sean Friday, November 21, 2008

    Is it possible to share more details on httpd.d and mod_dtrace.c or share the code?


  • Dave Pacheco Friday, November 21, 2008

    @Sean: Sure, I'll provide more details on writing USDT providers (using this one as an example) in an upcoming entry.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.