performance analysis: profiling

A lot of folks are taking Darkstar for a spin, which is exciting. It means, however, that we're getting more questions about how to understand what's going on in the system, why a given application doesn't perform as expected, etc. This isn't surprising: Darkstar is still a work in progress, and it represents a somewhat new programming model, so as a developer you really need good tools to help you build great games. We're still building up these tools, but while some are on the wish-list, others are available today.

Reading the forums you see some discussion about profiling, or specific numbers getting cited. I've posted a couple times about the basics, and how to use profiling to learn about what's going on in the system, but this is a little scattered. I thought I'd collect some of those details here, and give a quick round-up of what's available now, how you use it, and where this work is going. Remember this is very much a work in progress, so we really want feedback about what's useful, what's confusing, and what you'd like to see added!

If you've played with profiling or runtime-debugging facilities in other systems, what we've got in Darkstar should look pretty familiar. Basically, it collects data about what's happening in the system, and provides that data as a stream of reports. If you're writing a Service, you can register to provide data into this stream, and any developer can write any number of listeners to consume this stream of reports which then choose how to aggregate or represent the output. Real easy.

Recall that Darkstar is an event-driven system. Rather than starting threads or invoking long-running, monolithic tasks, the model is many short tasks that are in response to some event. In the case of Application code these tasks are always run in a transaction, though there is also support for non-transactional tasks. These tasks get run through the system schedulers, which decide how and when to run these task. For each task that gets run through a scheduler, a new Profile Report is generated. This represents a collection of details about what happened in the scope of that task, and makes it easy to correlate events and actions (e.g., timeout and the number of objects accessed in the data store).

If you want to start playing with all of this, where should you start? Probably the best place is in the com.sun.sgs.profile package, where you'll find ProfileReport and ProfileListener. The former is the actual structure that is provided for each task run through the system. The latter is the interface that anyone can implement to consume these reports. I won't go through all the details in this entry, but hopefully the basics of the reporting structure are pretty clear (if not, we want to know about it!). In addition to basics about task type, length, delay, failure causes, etc. there are also more general-purpose components like counters and operations that let listeners track specific kinds of data-points, like bytes read by the data store, tasks scheduled, etc.

If you're impatient (well, ok, most of us fall into this category), and just want to see some data in action, skip these interfaces and go right to the com.sun.sgs.impl.profile.listener package. Here you'll find a collection of ProfileLIsteners that we've written to get folks started. For the most part, we've written these as we've found the need to watch for certain details or to answer specific questions; we're pretty sure these are useful debugging tools. You can include any number of these listeners in your running system, though of course you do have to be careful about the performance overhead. For the most part these are very light-weight, but YMMV.

So, how do you use these? There are two properties you need to set. The first defines the overall profiling system, and the second decides which listeners are actually used. The property that defines the entire profiling system is:

  com.sun.sgs.impl.kernel.profile.level

In the 0.9.6 stack all profiling is off by default, so you need to set this variable to on. In the latest trunk (which I definitely recommend you grab if you're actually playing with performance analysis) a minimal amount of profiling is always running, but you can increase the amount of data being gathered by setting the property to medium or max (by default, it's running at min).

Once you have the profiler running, the next step is to define which listeners are included. You do this with a second property on startup:

  com.sun.sgs.impl.kernel.profile.listeners

In both the 0.9.6 and the current codebase, this variable is a colon-separated list of fully qualified classes that implement the ProfileListener interface. So for instance, if you set

  com.sun.sgs.impl.kernel.profile.listeners=com.sun.sgs.impl.profile.listener.SnapshotProfileListener

then the system will start up with the SnapshotProfileListener running. This is a pretty simple class that collects data over some period of time. By default it reports on the last 10 seconds worth of tasks, detailing how many tasks ran versus how many succeeded, how long on average the scheduler queue was, and the number of threads being used by the schedulers. The output is displayed on port 43007 (by default), so once the system is running, just telnet to this port and you'll see an update every 10 seconds. This is not a particularly complicated or detailed listener, but it gives you a quick, high-level view of how much work the system is doing, and whether it's keeping up or falling behind.

Take a look at the other listeners in that package. They all have class javadocs that explain what they do and how they can be configured. Some output on sockets, some via loggers, and others to standard out. Collectively, they can give you some good insight into what's happening in the system. Obviously these are only a starting-point, so we'd love you to suggest other features, or write your own listeners to contribute to the community. Hopefully these are useful to you in understanding how you can push Darkstar and your Application code, and when things aren't working, why that is.

I haven't talked about the other end of this system, namely, how you get data into the profiling stream. If you're writing a Service and want to include some specific detail, Take a look at the com.sun.sgs.profile.ProfileRegistrar interface. This is available in the registry that you get when your Service is constructed. Through this you can register to report different kinds of events and data. Note that we don't currently expose any of these interfaces to Application code yet, so you can't include details directly from your game logic, but that's on the list of features to add and (detect a theme here?) we'd definitely like to hear thoughts about how make this most useful.

This brings me to the final issue here: where we're headed. Like I said at the start of this entry, this is still a work in progress. We've got some basic features that have proven to be pretty useful in understanding application/system behavior, but we're really just getting started.

In addition to exposing interfaces to Application code, we also plan to add hooks to capture events from some of the Services that aren't reporting anything right now (especially the Session and Channel services). These will be like the details that the Data Service currently reports. You'll be able to see which calls were made (like creating a channel or sending a direct message), how many calls were made, how many bytes were sent/received, etc.

An effort that Jane is working on right now is making the whole system more dynamic. Right now you have to set the profiling level and the set of listeners on startup, and while you can set the level globally, you can't easily tune a specific set of data (e.g., turn off all reporting except from a specific Service). She's also working on exposing some of these controls through a more general JMX-driven management interface, which should make it much easier to run and manage a Darkstar stack and control how profiling happens. The ability to set levels is only the first piece of this work.

At the same time David and I have been working on a new piece of the system to track and manage conflict. While the work is more generally designed to drive the transaction and scheduling systems, one result is that ProfileReports will now include details about what objects were accessed, and on failure, what the likely source of conflict was. This isn't in the trunk yet, but hopefully it will be soon. In the meantime, check out the contention-rev branch for the latest bits. Once we get this committed, I'll post more details about this project, and get a wiki setup to track progress and feedback.

I know this is a pretty quick, high-level overview of how profiling works, but hopefully it's useful. Remember, the goal of these features was to provide a light-weight, fairly simple set of interfaces for tracking the kinds of details that give you insight into what's happening. Like everything else in Darkstar, this is a layered system, so we've got a lot of the core collecting and reporting, but we're still building up to some of the higher-level tools and the right ways to aggregate across a cluster. We've got some good ideas about this work, but none of us has had the spare cycles to make any progress. If this sounds interesting to you please let me know, and I'll point you in the right direction.

Happy profiling!

Comments:

Hi

I have followed your step to setup the profile.The server is start correctly but i can't connect and get report using telnet(43007). I have started telnet service in my vista. Am i missing any setting about this? Please help thanks.

Yiuc

Posted by yiuc on September 03, 2008 at 08:10 PM EDT #

FYI, I don't think that you need to setup the telnet service in Windows, since the listener is using a different port than telnet services use. You will need to make sure that your local OS isn't blocking access to any port in general.

Which version of Project Darkstar are you using? Also, how are you setting the properties?

You might start by enabling one of the listeners that displays to standard out, just to see if you've got profiling configured correctly. I haven't tried doing any profiling under Vista, so I have no idea if any security policies get in the way of trying to connect to this port.

seth

Posted by Seth Proctor on September 04, 2008 at 01:52 AM EDT #

Great post and draw. Thank you for sharing.

Posted by links london jewelry on November 30, 2009 at 01:36 PM EST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

stp

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today