By Henrik Ingo on Sep 28, 2008
In the previous post we had a look at the MySQL Cluster NDB API and how it enables direct access to the MySQL Cluster data nodes, and therefore also enables access through other protocols than SQL.
I've often asked myself: Since NDB is so great for MySQL Cluster, is there anything similar for MySQL Server (the not-cluster version...)? A couple of months ago Kazuho Oku did something like that and wrote in his blog about it.
The context for Kazuho's work is the social network use case: 1) You have users. 2) Some users are linked to each others as friends. 3) When a user logs in, he should see a timeline of events/messages from his friends. In a previous post he had already tested the difference between a "pull" and "push" approach. (Taking a small sidetrack here, imho both approaches are wrong: The messages/events should first be submitted only in the originating users table, then copied to each recipient by an asynchronous background process. This would give you the best of both worlds, submission speed of pull model and retrieval speed of push model. Anyway...)
For the test we are talking about now, Kazuho is exploring the pull model: When a user logs in, a query is executed to fetch messages/events from all of the user's friends. Kazuho then compared 3 ways to do this: By SQL from the app, by a stored procedure that does everything at once, and by a MySQL User Defined Function. (A UDF is something you write and compile in C and install as a plugin to the server. It can then be called simply as "SELECT functionname(...);".) The UDF is accessing the InnoDB data structures directly using MySQL and InnoDB internal functions, so it is reminiscient of using the NDB API to bypass SQL in MySQL Cluster.
Kazuho's results are clear:
|timelines / sec.|
|UDF using Direct Access||1,710|
1) This is a good example of a use case where using a stored procedure gives you an advantage over raw SQL. Never mind if you think MySQL stored procedures are inefficient or not, these numbers are clear, the stored procedure approach is 2,5 times more efficient.
2) The UDF rocks! Accessing InnoDB structures directly, it is 10+ times faster than the stored procedure.
There is one drawback though. Accessing the storage engine directly inside MySQL is a bit dangerous. There is no well defined API so there is no guarantee that the next version will not break your UDF. Well, I guess it wouldn't but in theory at least. And a UDF is something you have to install in the server, it is not a client API in that sense. But getting 10 times better performance is something to think about, if you're ready to get your elbows dirty.
PS. I hear the social networking problem is an especially interesting one from this point of view, in that it doesn't map easily to plain old relational databases and SQL. Getting what you want is a bit inefficient with SQL. Kazuho's UDF's show that it can be done tenfold more efficient, by accessing the data in a more optimum way. This is of course exactly the point with native data access.