performance analysis (1 of n)
By stp on May 16, 2008
One of the things that I spend a lot of my time thinking about is performance. It's not easy to observe and profile a system like Darkstar, and it's even harder to come up with quantifiable answers to these kinds of questions. A large part of this is because the performance of the system has a lot to do with the game that is running. If you've got a game that basically just does communication and persists little data, this will obviously behave very differently than one that does a lot of server-side manipulation of data. I end up spending a reasonable amount of time every week just looking at running applications, trying to understand what we can learn from these games and also how we can use this to improve the overall performance and features of Darkstar.
Given all this, I tend not to talk a lot about specific benchmarks, or quote specific numbers to the questions above. It's not that I'm unwilling to talk about the system's performance; Darkstar is open source, and anyone can run it and see for themselves how it behaves. No, it's just that it's really hard to come up with general truisms, and I'd rather cite meaningful data. In regards to the forum post above, however, there was a question about specific hardware. After a short disclaimer that you really can't make general statements about how many players will fit on a given system, I made a somewhat flip comment about a specific Sun product that we've been using for testing. I say "somewhat" because I've played with this hardware enough to know that it's an excellent target system for the size of game that was being asked about. I say "flip" because I thought it would be obvious that I was joking around a little (note the smiley). I was very interested by the flurry of responses.
First off, I think a lot of the questions being asked are exactly right. You need to know what the tests are that are being used to come up with the numbers of clients that a piece of hardware can support. You need to understand what kind of behavior is being stressed or simulated, and why that looks at all like a real application. You need to know whether this is being run in different scenarios, with different kinds of networks, different client behaviors, etc. Without these kinds of data-points, metrics are pretty useless.
What tests have I been running? Unfortunately, the core team hasn't had a lot of time to write full applications, so the tests are still somewhat limited. One test we're running comes from Project Wonderland, which is building a set of automated stress tests. You can download these directly from their subversion tree. Another source of data are the simple tests that we've got in the Darkstar test and example directories. We have a few test apps that we're working to clean up and get posted. We also have a few test cases that I can't share right now (sorry). I will take it as a task to get a wiki page going with performance results so some of the specific numbers we're seeing can be collected.
Second, while it's important to base metrics on understood test cases, I think it's also important not to get so caught up in the minute that systems become less usable. A lot of people come to Darkstar and want to get a rough sense of what they should expect. Is it reasonable to expect an average game can host 200 clients on a server? If so, what kind of server? No, it's not very scientific, but answering these kinds of baseline questions help people who are new to the technology and the model, and may get them interested enough to start learning more about what that actually means.
This is what I was trying to do in the forums: engage a newcomer to the community, with the caveat that the answer doesn't mean very much. I think it's important to set the course-grained expectations, even if that doesn't match with my instincts as a systems hacker. I also know from experience profiling many applications that the expectation I was setting is pretty reasonable. Besides trying to set some expectations, I did something else: I asked for more details about the application. This is really important, because the whole community really needs to hear more about how Darkstar is being used, and how well it's performing for any given design. I was pretty disappointed that no one picked up on this point, especially because everyone chiming in on that thread has at least one game that they could profile today.
So, finally, let me issue a challenge back to the community. Yes, the core Darkstar development team can and arguably should be posting benchmarks and profiling experience. However, anyone else in the community can be doing the same thing. I want to hear about your applications. I want to hear what your design looks like, and what assumptions you're making. I want to see detailed profiling results that illustrate what you're experiencing (I'll make it a priority to post to my blog soon about the profiling system in Darkstar). The questions that get asked in the forums are often very good, but I'd like to see more data get pooled. If I setup a wiki (or something) for collecting these experiences, who will step up and help people understand how Darkstar is actually working?