Sunday Sep 28, 2008

Accessing your MySQL data whatever way you want it (Part 2, InnoDB)

In the previous post we had a look at the MySQL Cluster NDB API and how it enables direct access to the MySQL Cluster data nodes, and therefore also enables access through other protocols than SQL.

I've often asked myself: Since NDB is so great for MySQL Cluster, is there anything similar for MySQL Server (the not-cluster version...)? A couple of months ago Kazuho Oku did something like that and wrote in his blog about it.

The context for Kazuho's work is the social network use case: 1) You have users. 2) Some users are linked to each others as friends. 3) When a user logs in, he should see a timeline of events/messages from his friends. In a previous post he had already tested the difference between a "pull" and "push" approach. (Taking a small sidetrack here, imho both approaches are wrong: The messages/events should first be submitted only in the originating users table, then copied to each recipient by an asynchronous background process. This would give you the best of both worlds, submission speed of pull model and retrieval speed of push model. Anyway...)

For the test we are talking about now, Kazuho is exploring the pull model: When a user logs in, a query is executed to fetch messages/events from all of the user's friends. Kazuho then compared 3 ways to do this: By SQL from the app, by a stored procedure that does everything at once, and by a MySQL User Defined Function. (A UDF is something you write and compile in C and install as a plugin to the server. It can then be called simply as "SELECT functionname(...);".) The UDF is accessing the InnoDB data structures directly using MySQL and InnoDB internal functions, so it is reminiscient of using the NDB API to bypass SQL in MySQL Cluster.

Kazuho's results are clear:

Building Timelines on MySQL
timelines / sec.
Stored Procedure136
UDF using Direct Access1,710

1) This is a good example of a use case where using a stored procedure gives you an advantage over raw SQL. Never mind if you think MySQL stored procedures are inefficient or not, these numbers are clear, the stored procedure approach is 2,5 times more efficient.

2) The UDF rocks! Accessing InnoDB structures directly, it is 10+ times faster than the stored procedure.

There is one drawback though. Accessing the storage engine directly inside MySQL is a bit dangerous. There is no well defined API so there is no guarantee that the next version will not break your UDF. Well, I guess it wouldn't but in theory at least. And a UDF is something you have to install in the server, it is not a client API in that sense. But getting 10 times better performance is something to think about, if you're ready to get your elbows dirty.

PS. I hear the social networking problem is an especially interesting one from this point of view, in that it doesn't map easily to plain old relational databases and SQL. Getting what you want is a bit inefficient with SQL. Kazuho's UDF's show that it can be done tenfold more efficient, by accessing the data in a more optimum way. This is of course exactly the point with native data access.

Monday Sep 01, 2008

Accessing your MySQL data whatever way you want it

One way to look at a database is that

  1. there is data, and

  2. there are ways to access data.

This dichotomy was actually coined (whether intentional or not) by Vinay in the MySQL telecom team when discussing the MySQL Cluster vision some months ago.

Even if you typically think of MySQL Cluster as just a clustered version of the plain old MySQL server, it is actually more like the opposite is true, if you consider the architecture and history of MySQL Cluster. The original cluster was just the data store called Network DataBase or NDB as we familiarly know it still. Then MySQL Server was integrated on top of that to provide an SQL interface. The original and "native" NDB interface is still there though, and many prefer to use direct C++ or Java access to their NDB-clustered data. It is faster just in general, but especially applications with a real-time requirement will benefit from omitting the MySQL Server. No additional network hop and no parsing of SQL, just direct access to your data. Sometimes also you might benefit from being able to do things with the NDB API that cannot be efficiently expressed in SQL at all.

But did you know that in addition to the SQL and NDB API there are actually multiple more ways to interface with MySQL Cluster available:

  • LDAP was presented at a MySQL User Conference BOF last spring and is actually available now as an Option in the Carrier Grade Edition. The LDAP interface is actually an OpenLDAP server node, using a native NDB backend in the slapd.

  • mod_ndb is a REST Web Services API for MySQL Cluster. This one is (obviously) implemented as an Apache module. Although produced by our own John 'JD' Duncan, it is not a Sun supported product.

  • We also know of a case where MySQL Cluster is accessed through the RADIUS protocol, although I don't think this implementation is publicly available.

  • And someone also does it with DIAMETER, a successor to RADIUS.

I don't know details on the 2 last ones, but at least the 2 first ones use NDB directly. That is much more efficient and convenient than for instance doing some LDAP->SQL conversions when SQL really isn't needed in the first place. Moreover, you did realize that all these interfaces are equal citizens with the mysqld - they are all just api nodes. Meaning, you could have one big cluster and access that same data with SQL, LDAP, HTTP, RADIUS and DIAMETER, and of course directly from your application code with the NDB C++ or Java API. Which brings us back to the title for this blog post: You have data and you have ways to access the data. Whatever ways suits you the best.

Then of course for the interesting question? Are there more protocols/API's out there for MySQL Cluster that we don't know about? (Or that I ommitted by mistake?) Are there some protocols there that would be interesting to implement? Let us know at hingo at mysql dot com (or comment below)!


The people of the MySQL Telecom team writes about developments around MySQL and MySQL Cluster and how these products are used by our Communitcations industry customers. (Image jasmic@Flickr)


« July 2016