Monday Jul 13, 2009

Jumping through hoops

Sometimes you find something that sounds like it should be fairly straightforward but ends up being a bit more complicated than you thought.

Our intern Frank commented that he'd really like to have a little more agility in trying out the code that he's working on. You see, we run our data store on a grid. The grid is physically somewhere else, and more importantly, it runs our code in a virtual network (you create a network to run your processes in so everything is isolated). Every process you start runs in a Solaris zone with its own IP address(es). So in order to run some test code, you need to build it, upload it, then restart your process. This doesn't take a lot of time, but it adds several steps between "change a line of code" and "try it out against our live system". To get the results, you need to either look at an output file that your process creates, or you need to connect to some server providing it (probably a web server that can talk to the process you ran). Fairly cumbersome.

This has been a problem for a while, but we hadn't really gotten to a point yet where we couldn't just test our code by running against a small dataset running on a "distributed" instance of our data store that actually runs all in one JVM on our development machines. Frank is working on stuff that really needs "big data" so that isn't going to cut it anymore. The external interface to our live datastore is based on web-services, but for heavy computation, we want to speak directly to the store over RMI. So since we seem to be letting our interns dictate what we should work on, I went to work providing a solution.

One of the many nice features of the research grid we're running on is that we can use a VPN client to connect in to our virtual network. Once I figured out how to configure this and allocate the resources for it, it worked like a charm. This should be easy now, right? Just get your client machine on the network and you can use RMI to talk directly to our data store! Except, it didn't work. We use Jini to self-assemble the pieces of our data store and also to discover the "head" nodes that allow client access. Turns out the VPN client wasn't handling the multicast packets needed to do discovery. No problem - I specified the IP address that was assigned to our RMI registry process. Still didn't work. It would attempt to load our services but it would hang and eventually time out. This was getting tricky.

One problem I noticed from the start was that I wasn't getting any name resolution to work. Perhaps Java was trying to look up the names of hosts and timing out. Enough host name timeouts, and our own connection timeout would be reached and it would fail. Now the problem was that while all of the processes running on the grid itself (inside Solaris zones) had access to the name server for the grid, the VPN clients connecting in could not actually route packets to the DNS server. (Remember, everybody gets their own private network, but name service is provided by the grid -- you don't need to (or want to) run your own). Now we could have actually started our own name server, or we could have made a hosts file that has all the names of the machines we use. The problem is, this is a dynamic environment. From one start of the system to another, the actual addresses may change. (Note that we don't care what the addresses are for the system to work -- Jini lets us self-discover all the components.) The solution I chose to go with was to start an on-grid process that would relay DNS requests to the actual name server and back.

Of course, I initially thought that even this would be easy because there are any number of port-forwarding Java programs out there, and I've even written one or two myself for various tasks. I started by deploying one of these and was surprised to discover that it still didn't work. No name resolution. Fortunately, I happened to look at the little network monitor I was running and noticed (with a loud slap of my forehead) that of course name resolution usually uses UDP, not TCP (which is what all these port forwarders are written for). I hoped that maybe I could configure my name resolver to use only TCP (yeah, much less efficient, but still tiny in the grand scheme of things), but no such luck.

So, I looked around and was (I guess not very) surprised to find that there wasn't any Java code out there for running a simple UDP port forwarder. Fortunately, the code turns out to be relatively simple - at least, it was once I reminded myself how UDP works. I wrote a simple process that receives datagrams on port 53 then chooses an unused outgoing port to resend the datagram (after re-writing the source and destination addresses) to the actual DNS. It starts a thread that waits for the response on that port then sends the response datagram packet back (again, with the addresses re-written) to the original requesting machine (the VPNed client machine). I deployed the code as a grid process and sure enough, it worked like a charm!

All told, this probably should have been easier. On the other hand, we can now connect our machines into the network that runs our live system and run code directly against it. What I didn't mention is that Frank isn't actually running Java code. He wants to run Python code that uses a native number crunching library that starts a JVM that allows him to pull Java objects from the data store and pass them into the python library. He's running that in his virtual linux box running in VMWare on his Mac (which is connected to the virtual network on the grid via VPN). We don't know yet just how slow this is all going to be.

Monday Jun 22, 2009


Hi folks.  I'm a researcher in Sun Labs working on The AURA Project.  I am responsible for designing and developing the distributed Data Store.  We're interested both in studying existing data stores, and playing with creating our own.  Making our own has a couple advantages.  First, it can be customized to do things the way we want.  Second, it gives us a good perspective from which to understand and evaluate other distributed data stores.  We have different needs than most distributed key/value stores though.  We don't just want to store values and retrieve them by key.  We're interested in doing some computation on the data that will help us to generate recommendations on the fly.  That computation is best done close to the data, so we have a special-purpose data store that can handle these types of queries and give back results quickly enough that we can build a live service on them.  We do them on the fly so that you can give feedback to the recommender and have it immediately take your preferences into account.

The data store isn't exactly mature, but it is functional and stable.  I'll post to this blog as I have thoughts or interesting (in my opinion) problems regarding the data store.  I may also post observations about our data store and how it compares to others which may not be at all interesting to you but may help me as I build a mental picture of what the realm of existing data stores looks like.

You can get the source code to our project on Kenai, and you can try out our music recommender system built on our data store.


Jeff Alexander is a member of the Information Retrieval and Machine Learning group in Oracle Labs.


« July 2016