By stp on Mar 16, 2008
What kept me away? Lots of things, which is good because it's a sign that we're pretty active here getting the Darkstar technology ramped up. In particular, however, I've been busy working on some core features for the 1.0 release. I promised that I'd talk about some of this work, so I thought I'd spend a little time this week letting folks know what's been keeping me so busy.
At the core of each Darkstar node is a scheduler. For those of you familiar with operating systems, routers, or similar systems, the notion of a scheduler is nothing new. For those of you who haven't done much systems work, suffice it to say that a scheduler is (roughly speaking) a mechanism for resource management. You have tasks that need to get done, and the scheduler is the component that decides how to order those tasks. Generally speaking, the scheduling policy is based on which resources are scarce or valuable: CPU cycles, network bandwidth, available IO ports, etc. Any given system has different constraints and different priorities that help define how you order outstanding requests for pending tasks.
What I've been working on recently is this scheduling problem. This is not a new area of research, although the kinds of factors in our system make it unusual (if you're active in the field of transactional memory you may have seen some of these discussions). Essentially, what I care about is how to accept tasks that we're going to run, and then how to decide when to run those tasks. Based on Transaction conflict we may want to re-order those tasks, and based on how tasks fail we probably want to re-try them in some intelligent manner. I've also been thinking about how to make it easy for the other components of our systems (e.g., Services and other Transactional components) to take advantage of our infrastructure. Lots of interesting problems.
In the last few months we've changed some properties of our system. Anyone who has tried to write a Service has probably seen some of the superficial changes in our interfaces. What you haven't seen are the more interesting and complex changes in our core. Last Friday I committed a lot of changes to our internal source tree. Essentially, what I was working on was updates to our scheduler, transaction, and dependency model. I've introduced two schedulers (one that handles transactions and one that doesn't), added dependency to the transactional scheduler, and simplified how transactions get setup and run in our system. Doing this let me remove a number of classes, which makes me pretty happy.
More important that just cleaning up the code (which is always a good reason for significant updates), this work let me re-factor our re-try logic into one place that is co-located with how we do scheduling. Why is this important? When a transaction fails, there may be many causes. Before we re-try that transaction, we really want to figure out why the transaction failed in the first place and whether that introduces any dependencies in the set of scheduled tasks. If you've been looking closely at the current source release, you'll see that we don't do much in this space; when you look at our next source release, you'll see that this will change.
One of the things that I'm exploring now is exactly how we re-try failed transactions. This can have a significant impact on contention in the system, which in turn effects how many tasks succeed, and how many times we have to hit the data store. On the whole, it's all pretty interesting. When we release the code I've talked about here (which I hope will happen within the next month or so) I'll talk more about the specifics of this behavior. Until then, I think it'll probably be confusing, but feel free to mail me if you're curious and I'll be happy to tell you more. Until then, suffice it to say that there's a lot of interesting stuff going on, and next week I'll try to continue this discussion by talking about the transaction model, and some of the details around contention.
What interests you most about our model? What would you like to learn more about? Let me know!