Friday Aug 07, 2009

MySQL Proxy benchmarking and profiling

On our road to a MySQL Proxy GA release, there are quite a few things that need attention, one of them being multithreading.

We are actively working on release 0.8, which brings the first stage of multithreading to the table: Enabling worker threads to handle network events in parallel. The code has been available for quite some time now and we’ve started to implement several performance benchmarks to track our progress, catch regressions and deficiencies early on.

Benchmarking is an interesting field, especially since you can screw up so easily :)

To avoid making mistakes and to spend less time reinventing the wheel, we are doing the same as we are doing with the code: Stand on the shoulders of giants.

To that end, we are using existing tools and toolsets as much as we can, and one of the most popular drivers to run with MySQL is sysbench. However, we’ve found that while sysbench generates load, it doesn’t support all the different scenarios we are trying to test in.

Let me quickly summarize what we have to look at when judging performance with MySQL Proxy:

  • We need lots of connections, to make sure the event-based IO we are based on scales appropriately.
  • We want those connections to be of different kinds:
    • lots of idle connections
    • lots of active connections
    • a predetermined ratio of the two above
  • Ideally we would like to run a predetermined number of queries/second, to compare CPU usage as we optimize the code (optimization is another ankle biter…)
  • We need to vary the different variables that happen in real world scenarios
    • resultset sizes (average ones, standard deviation of them, fixed sizes etc)
    • query execution times (our performance characteristics depends on the distribution of really fast queries and slower ones, we generally do better with slower ones right now)
    • connection establishment/persistent connections ratio (there’s overhead involved with establishing new connections, e.g. the entire handshake)

As you can see we are aiming to cover the most common scenarios you’d encounter in production environments. Now, obviously we don’t have a private copy of Facebook to play with, let alone the load its users generate (and to be honest, I don’t want the headaches involved on top of my existing headaches ;)).

We’ve established that sysbench is not the silver bullet. Then, what is?

So far we haven’t found anything pre-made that lets us do what we need, thus our only option is to write the benchmarks ourselves. But since we aren’t really in the business of doing performance analysis (after all we are getting paid to ship software, not to write benchmarks all day) we have been looking around.

It turns out that there’s a really nice framework that enables you to focus on writing exactly your benchmark, without worrying too much about scaling the drivers, creating graphs and deploying the whole lot: Faban. What’s even funnier is that it’s written by our colleagues here at Sun – sometimes the good is so near.

Faban takes care of most things for you, including running monitoring tools like mpstat or iostat during the benchmark run, includes fenxi to create nice graphs and to compare benchmark runs and is written in Java which makes it nicely portable across different systems (though you really want to have Java6 to fully take advantage of it – hardly a problem nowadays).

The best thing about all of this is: We will be able to scale out our benchmarks to many driver machines to create insane amounts of traffic, much like JMeter supports.

Now, that all being said, sysbench has been proven to be useful to us and until our new benchmarks are ready to be put to use, we are “stuck” with it. Actually we probably will end up running it in the future, too, but that’s a different story. To make it all nicer on the people who actually run the tests, I’ve integrated sysbench into Faban, so we can run it and gather its output from and in a single place. If nothing else, it was a nice way to dive into Faban’s internals and gather experience for the other benchmarks.

Another nice side effect of using Faban is, that other people can run it on their systems, Faban is open source, and once we publish our benchmark code, other people can verify our results, on their own machines, a prerequisite for getting valid results.

And who knows, maybe they will be useful to people for completely other purposes than the ones we are putting them to use for.

Thus, a little request: If you have suggestions or requests on what you would like to see in a generic MySQL-specific benchmark leave a comment or contact us on our Launchpad discuss list (you need to register on Launchpad and then join the list – all free as in beer).

Thursday Jul 16, 2009

MySQL Proxy Hackathon 2009

During my talk at the MySQL Users Conference, Heidi came up with the idea of holding a hackathon for Proxy.

It seemed well received, and we mulled it over a bit.

At this point we are proposing the following:

  • Meet for two days on October 15th and 16th, i.e. Thursday and Friday
  • Venue will be the Sun Office, Hamburg, Germany
  • Topics include (but bring your own!):
    • Introduction to the code base
    • Work on bug fixes
    • Discuss ideas for going forward
    • Prototyping ideas
    • Integrate your work into the mainline (as it makes sense/is possible)

While the officially planned time is "only" two days, there's the possibility to extend the meeting into the weekend, if necessary.
Both Jan and I live close to the office, in fact for me it's about 15 minutes away :)

Also, by moving it close to the weekend, it makes for a great opportunity to extend the stay and do some sightseeing, and of course have beers with the team ;)
Flights are typically also cheaper if the stay includes a weekend.

If you are interested in attending, you can either comment on this post, send me an email or join our Launchpad discussion team and sign up for the mailing list (free Launchpad account required).

For planning purposes we would need some contact information from those who would like to attend - which we will keep private of course and not use for anything else. Goes without saying, but still :)

Looking forward to seeing you in October!

Thursday Jul 02, 2009

MySQL Proxy 0.7.2 released

The following just went out to our mailing list:

We are happy to announce that MySQL Proxy 0.7.2 is available in a source and binary release for all of our target platforms.

This latest release also brings back Windows support in both the source and binary release.

The list of important changes in this release is:
\* fixed memory leak proxy plugin (#45272)
\* fixed ro-balance.lua (#45408)
\* added CMake build files
\* fixed portability issues for Win32
\* added mysql-proxy-svc on Win32
\* updated INSTALL file to cover all the build steps on win32

Please report any problems on bugs.mysql.com, our Launchpad discussion mailing list or on IRC: #mysql-proxy on irc.freenode.net.

The focus of development is now on MySQL Proxy 0.8.

MySQL Proxy 0.7.2 is currently the recommended version to use unless you develop MySQL Proxy itself.

The packages are available on the MySQL Downloads page.

Monday Apr 27, 2009

Slides from Advanced Query Manipulation with MySQL Proxy

Just quickly dropping the PDF version of the slides of my presentation at this year's MySQL Conference.
Enjoy!

Friday Jan 23, 2009

MySQL Proxy Web-Seminar

I just realized something else Chris has mentioned:

Since MySQL Proxy 0.7.0 is soon to be released, I thought another brief tutorial would be helpful.

Indeed, we are looking to properly release 0.7.0 soon™.

In case you are wondering what’s happening with MySQL Proxy and what the deal with the code’s new home is, please consider making some time for the web-seminar next week.

It’s free but registration is required.

We will also have Carlo Cabanilla talking about his project and I will talk about where we are with Proxy, what our plans are, how you can get involved and what the next steps are. Please bring your questions :)

Query cache and comments

Update

Ok, as Morgan quickly found out: I'm incredibly stupid. Read his comment and you'll know why. Ok, you'll not know why but you'll know that I am.

Really cool to see Chris taking up blogging as well :)

He has written nice little example about inserting comments into queries to distinguish the client’s IP when they are funneled through the proxy. Reading the comments about this little trick making the query cache not work, I couldn’t help thinking that those are wrong. I vaguely remembered that in some recent version this shortcoming was fixed, so I decided to run a little test on 5.1.30 to verify:

mysql> select concat(@@version_comment, ' ', @@version);
+-------------------------------------------+
| concat(@@version_comment, ' ', @@version) |
+-------------------------------------------+
| MySQL Community Server (GPL) 5.1.30-log   | 
+-------------------------------------------+

mysql> show status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16768384 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 0        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 0        | 
| Qcache_total_blocks     | 1        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

mysql> use test;
Database changed

mysql> create table a (a int);
Query OK, 0 rows affected (0.07 sec)

mysql> insert into a values (1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> show status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16768384 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 0        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 0        | 
| Qcache_total_blocks     | 1        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

mysql> select \* from a;
+------+
| a    |
+------+
|    1 | 
|    2 | 
+------+
2 rows in set (0.00 sec)

mysql> show status like 'Qcache_queries_in_cache';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_queries_in_cache | 1        | 
+-------------------------+----------+
1 rows in set (0.00 sec)

So far, so good. Query cache is enabled, I’ve got some basic data in it, and am ready to test my frail memory. Here we go:

mysql> /\* 127.0.0.1:11251 \*/ select \* from a;
+------+
| a    |
+------+
|    1 | 
|    2 | 
+------+
2 rows in set (0.00 sec)

Drumroll….

mysql> show status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16766848 | 
| Qcache_hits             | 1        | 
| Qcache_inserts          | 1        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 1        | 
| Qcache_total_blocks     | 4        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

Aha! So it does use the query cache, when there is a comment in front. In fact, running the same query with any amount of whitespace difference in front will use the query cache. Go try it, it works!

So, comments in front are ok, what about the end of the query text? (I’m omitting the data now, it just isn’t that exciting ;)) First, let’s clean the slate, flushing the status vars, recreating the table and inserting the data again, so we don’t have a problem seeing exactly what’s going on:

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> drop table a;
Query OK, 0 rows affected (0.00 sec)

mysql> create table a (a int);
Query OK, 0 rows affected (0.05 sec)

mysql> insert into a values (1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> show status like 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16768384 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 0        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 0        | 
| Qcache_total_blocks     | 1        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

mysql> select \* from a;
[...]
2 rows in set (0.00 sec)

mysql> show status like 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16766848 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 1        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 1        | 
| Qcache_total_blocks     | 4        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

Ok, just what we expected: 1 insert into the query cache (our select) and there’s only one query in the cache at all, also our select \* from a. Let’s run the select with the comment in front again, and then put a different comment at the end.

mysql> /\* from some other host \*/ select \* from a;
[...]
2 rows in set (0.00 sec)

mysql> /\* from some other host \*/ select \* from a; /\* with a comment at the end \*/
[...]
2 rows in set (0.00 sec)

mysql> show status like 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16766848 | 
| Qcache_hits             | 2        | 
| Qcache_inserts          | 1        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 1        | 
| Qcache_total_blocks     | 4        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

Whooohoooo! :)

This actually works! In case you are wondering, it will also pick up things like: /\* from some other host \*/ select \* from a /\* with a comment at the end \*/ ; (note the place of the semicolon!)

Now, the obvious next question is: What about comments in the middle of the query? Let’s run a test:

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> show status like 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16766848 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 0        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 1        | 
| Qcache_total_blocks     | 4        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

mysql> /\* from some other host \*/ select \* /\*and a comment in the middle\*/ from a /\* with a comment at the end \*/ ;
+------+
| a    |
+------+
|    1 | 
|    2 | 
+------+
2 rows in set (0.00 sec)

mysql> show status like 'Qcache%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 1        | 
| Qcache_free_memory      | 16765824 | 
| Qcache_hits             | 0        | 
| Qcache_inserts          | 1        | 
| Qcache_lowmem_prunes    | 0        | 
| Qcache_not_cached       | 0        | 
| Qcache_queries_in_cache | 2        | 
| Qcache_total_blocks     | 6        | 
+-------------------------+----------+
8 rows in set (0.00 sec)

And as expected, this does not work. Embedding comments still is a no-go (as is changing the case of any of the keywords or identifiers) because the queries need to compare byte for byte. Except, as we’ve found out, for whitespace in front and back of the query text! Remember, comments are usually regarded as whitespace (and most compilers and interpreters collapse them to a single blank character).

So, what’s the deal about this? It simply means that doing what Chris came up with is perfectly ok for deployments relying on the query cache, provided their MySQL version supports this. Now I have no idea when this change went in, but as 5.1.30 is GA, that’s good enough for me right now. And using the simple commands above you can easily check this for the version you are running, it’ll take you about 1 minute to find out ;)

Oh, and if you do that, please leave a short comment :)

Tuesday Jan 13, 2009

MySQL Proxy code now live

It’s done!

Even if it took a while, too long, really, we finally have the MySQL Proxy code out in the open, where it belongs.

As part of this effort, we would also like to make our development process more transparent. To that effect, we’ve also created a Launchpad team that serves as a discussion hub. It comes with a mailing list, reachable at mysql-proxy-discuss@lists.launchpad.net, where we hope to have some interesting discussions about where to take Proxy, what to implement in what way, and of course to offer advice to users.

There’s not much in terms of “process” yet and quite possibly (and hopefully) it will stay this way, but let me outline some of my hopes and plans for going forward:

Pushing directly into the branches that make up the lp:mysql-proxy project is restricted to those in the MySQL Proxy Developers team, only because of licensing issues.

However, since everyone here feels deeply about open source there is a relatively painless way of getting contributions in:

  1. Talk to us on mysql-proxy-discuss@lists.launchpad.net to make sure
    • the idea is sound
    • no one else is already doing it
    • you know what’s involved
    • other people know what’s happening.
  2. Get the code from lp:mysql-proxy and do the changes you’d like to do.
  3. Then push your branch back to Launchpad.
  4. Propose a merge to let us know that you have something you think we should be looking at.
  5. Pending a code review from us (either Jan or me have to approve it at least) we will merge the changes back to trunk and it will eventually see the light of day in a release.
  6. The above actually has a prerequisite: The contributor has to agree to the Sun Contributor Agreement, as described by the MySQL Forge Wiki page.

    I think you will find the SCA to be very friendly, at least I wish I could’ve signed it back in the day. (Kudos to Jay Pipes have to be given here!).

    Unfortunately we have to require people to sign it, otherwise we cannot accept anything other than the most trivial patches, again due to licensing issues.

  7. There is no point 7.

Let also note that the above (sometimes even with contributor agreements) is nothing totally unnatural in the Open Source world and in fact most projects I’ve contributed to had some kind of a “gatekeeper” mechanism, so please do not be discouraged by it. In fact, cool kids are doing it, too ;)

There are a couple of changes to the layout of the code coming up, but I’ll talk about it when the time comes. In the meantime, please give the latest code a try and report any bugs you find on our bugs system, we are not using Launchpad for them, to avoid losing the overview…

Thursday Aug 21, 2008

Agent vs agent-less monitoring

Baron posted something interesting about agent vs agent-less monitoring in response to Rob.

While reading it, I couldn’t help thinking that the distinction is somewhat misleading, if not wrong.

I’d go so far as to say that agent-less doesn’t exist as such. Why do I say such heresy?

Trivially, you need some piece of software to collect data. With munin you configure a server that triggers scripts on the monitored servers. The set of data sources is governed by what you install in the correct directory on the monitored server. Cacti relies on SNMP heavily and also allows you to write plugins which then connect out from the central Cacti server to the individual hosts to be monitored. I believe you don’t have to do any SNMP but can report data in a different protocol, but since I haven’t done it I don’t know the mechanics.

Now, with Cacti the scripts are all in one central place, which is nice. It’s certainly not the case for munin and that’s where the problem comes in.

Munin in fact is an agent based system as my definition goes. The code necessary to monitor services is not in a central place. For Nagios and Cacti it appears to be “true” agent-less, but effectively you exercise a non-trivial amount of code on the client side as well.

If your definition of an agent (in the monitoring world) is that it runs even when not connected to the monitor server, then yes, there’s a distinction.

However, I prefer to think about it this way: With agent-less systems, you are effectively retransmitting the “agent” each time you retrieve data. Not only do you establish connections at a rate equal to your most frequently collected data, but you are retransmitting much more data in a given time. I find that wasteful.

One other problem I’ve encountered with “agent-less” systems (or rather with “mini-agent” sytems like munin) is that if you have a problem with the network between the monitoring server and the monitored ones, or the monitoring server says bye-bye you must lose all data during that interval. There’s no way to collect the data, simple as that.

In my experience it’s not uncommon that an outage affects only part of your network (especially if you have redundant datacenters - which is why you have them in first place…) and losing much of your monitoring ability without any way to recover from a temporary problem (say duration of about 5-10 minutes) is a bummer, especially in high-traffic environments.

Another interesting point is that agent-less systems are polling constantly. That is fine for sampling data that is constantly changing but for other things it’s far from ideal. The agent can use an “interrupt” model when reporting data back to the monitoring server, because it’s an active component able to report data out of its own accord. Take query analysis for instance. If there’s no data to report, don’t report it. Less work to be done on both sides, less traffic going back and forth. The same goes for setting thresholds for instance, you can report once the threshold was exceeded and maybe continue to send actual values while the readings are out of their allowed range, but stop again once everthing is back to normal. This approach greatly reduces the amount of dataflow, but you cannot do it with a polling scheme.

Agent systems do have their own problems, including higher maintenance efforts, but the idea is that each system is separate and isolated from each other. And quite frankly, I don’t buy the “it’s hard to update agents” argument. There are quite a few free management solutions that make software deployment pretty easy once it is in place. Every place I worked at had some kind of system like that and a performing an update of software took almost no effort at all (that includes MySQL, Apache and even kernel updates). It’s quite possible to include agents in that system and have them deployed automatically, after testing it in a stage environment, of course.

I believe agents have their place and a properly designed system can mitigate almost all maintenance issues.

Obviously, if you want to measure network latency or service availability (is the HTTP service still up and responding), there’s little sense in deploying agents. Just use Nagios, it’s good for something like that!

Tuesday Jul 08, 2008

Proxy webinar and poll

While I realize that part 2 to the Lucene stuff is still forthcoming… (err..sorry?!), there are some interesting things going on:

  1. Giuseppe, John Loehrer and Jimmy Guerrero will host the webinar Designing Scalable Architectures with MySQL Proxy today. Jan and myself will also attend.

    So, if you are thinking about using the proxy or are simply interested in what people do with it, strongly consider attending this one.

  2. There’s a quickpoll up on the MySQL website where we’d like to gather your feedback on what you do or want to do with the proxy.

    Of course, it’s not idle musings but we are looking for areas to focus on in the coming months. This is your chance to make sure we work on the features you want to see. We have a lot of exciting ideas floating around, but we can’t implement them all at once. By asking for opinions we hope to satisfy everyone, us doing useful work and you getting the features you want :)

Wednesday Jun 25, 2008

Calling Java code in MySQL

This post is a follow-up to my talk at JavaDeus 2008 where I showed how you can make use of Lucene inside of MySQL using an experimental branch and some triggers and stored procedures.

Since the process is not entirely obvious and the branch is very experimental, I thought it would be a nice thing to put it online. It’s pretty amazing with how little code (in Java and otherwise) you can implement a working full-text search engine that simply doesn’t get into your way :)

Note: Be advised that the MySQL server we will build is likely to crash at times. Do not use it in production! And of course, always have a safe backup of important data.

This first part will cover the basics on how the get the necessary foundation in place, configuring the server and calling Java UDFs. In the second part, which I hopefully will post tomorrow, I’ll show the juicy rest of getting Lucene going.

To achieve our goal we need several things:

The software versions I used are:

  • Mac OS X 10.5.3 (nothing specific about OS X here, though)
  • Java 1.5.0_13 (you need the SDK, the runtime will not suffice)
  • gcc 4.0.1
  • mysql-5.1-wl820 revision 2584 (as of 2008-06-19)

Ok, off we go.

After you installed Bazaar, you are ready to get the MySQL 5.1-wl820 sources. On the command line do:

$ bzr branch lp:~starbuggers/sakila-server/mysql-5.1-wl820

This might warn you that you cannot push things back to launchpad, but we’ll just ignore it, because we don’t actually want to push changes back.

At this point, I recommend getting up and doing something else, like boarding a flight to Hawaii, because it will take a bit until the sources are downloaded.

When it’s done change into the directory bzr created and configure the server. This is a branch of the MySQL 5.1 main-line, so everything that applies to building the community sources (or Enterprise for that matter) applies to this tree, as well.

You can go the autoreconf route, or be lazy like I am and use the scripts in the BUILD directory (choose the one that applies to your system).

One thing to watch out for, is that the check for a working javac assumes the Java CLASSPATH contains the current directory. Check that your settings (most likely the CLASSPATH environment variable) contain . in there! Skip the first step if you already have . in your classpath (or alter the command if you are not using bash…aah the joys of Unix).

You most certainly will want to specify an installation prefix for this server (otherwise it would be installed in a system-wide location), so don’t forget to fill in the --prefix option.

$ export CLASSPATH=$CLASSPATH:.
$ cd mysql-5.1-wl820
$ ./BUILD/compile-pentium-debug-max-no-ndb --prefix=<where you want the server to live>
[...lots of output...]

Building MySQL might take a bit, depending on your machine, so do something useful while waiting :)

Double check your classpath if the check for javac fails. (Can you tell I always forget that?!)

Aside: If you own a multicore machine, you can speed up the compile considerably by telling the build script to just configure the server and then build it manually (note the -c). Replace the 9 in the make invocation with “number of cores you have”+1:

$ ./BUILD/compile-pentium-debug-max-no-ndb -c --prefix=<where you want the server to live>
[...lots of output...]
$ make -j9
[...yet more output...]

Now is the time to install the server in the location you gave to the build script above. For simplicity I’ll assume that you have no other MySQL server running, please refer to the MySQL docs on how to setup multiple mysqlds on one box.

$ make install
[...still more output...]
$ cd <installation_directory>
$ ./bin/mysql_install_db

Follow the steps mysql_install_db outputs to secure your installation, though for the purpose of this setup guide I have skipped that and am running totally insecure :P (The real reason is that I’m to lazy to remove the anonymous account when testing stuff like this.)

Let’s start the server and make sure everything works so far:

$ ./bin/mysqld_safe &
$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 1
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\\h' for help. Type '\\c' to clear the buffer.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema | 
| test               | 
+--------------------+
2 rows in set (0.00 sec)

mysql> quit
Bye

If you can successfully connect we are already halfway there :)

Check that you are connected to the right server (note the server version is 5.1.26-rc-debug) and that there are no databases other that information_schema and test.

Let’s now hit the first roadblock in installing this experimental tree ;)

To load the Java plugin, we will need to connect as root (because we need access to the mysql database):

$ ./bin/mysql -uroot
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 3
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\\h' for help. Type '\\c' to clear the buffer.

mysql> INSTALL PLUGIN `Java` SONAME "psm_java.so";
ERROR 1126 (HY000): Can't open shared library '/private/tmp/mysql-5.1-wl820-build/lib/mysql/plugin/psm_java.so' (errno: 2 dlopen(/private/tmp/mysql-5.1-wl820-build/lib/mysql/plugin/psm_java.so, 2): image not found)

The problem is that apparently the installation process copies the language plugins into the wrong directory. They should be in lib/mysql/plugin/ but are being installed in lib/mysql/. For security reasons we are not allowed to use paths when loading a plugin, so we need to copy (or symlink) them.

$ cd lib/mysql/plugin
$ ln -s ../psm_\* .

Another try:

$ ./bin/mysql -uroot
mysql> INSTALL PLUGIN `Java` SONAME "psm_java.so";
ERROR 2013 (HY000): Lost connection to MySQL server during query

Whoops. One problem down, and a crash pops its ugly head up. Let’s investigate by looking at the error log (it’s in var/machinename.err). It will contain a line like:

Can't find class: com/mysql/udf/UDFModule080622 20:38:17 [ERROR] Plugin 'Java' init funtion returned error.

Which basically means: Our old friend, the classpath, is wrong. This being an experimental branch, there are error checks missing to make problems painfully obvious. No worries, we’ll just fix the classpath.

But how?

The psm_java plugin contains code that embeds a JVM and as such there must be a place to tell it its startup parameters, like the heapsize and the classpath. There is a configuration file name jvm_options.cnf in the MySQL datadir, which contains those settings. If it doesn’t exist yet, simply create a file of that name.

Mine contains

-Djava.awt.headless=true
-Djava.class.path=/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf.jar:/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf_util.jar:/tmp/mysql-5.1-wl820-build/lib/mysql/java_udf_example.jar

This file also is the place to tell the JVM which garbage collector to use, which can be very important, as the VM is running inside a database server.

/tmp/mysql-5.1-wl820-build is the basedir where I installed MySQL into, the actual .jars you need are in lib/mysql/. java_udf_example.jar contains a couple of samples to get you started. They are built as part of the build process we did earlier and their sources are in the plugin/java_udf/org/example/mysql/udf/ directory of the MySQL source directory.

After creating or modifying jvm_options.cnf, let’s check if our settings worked. Restart mysqld_safe and connect with the client:

$ ./bin/mysqld_safe &
$ ./bin/mysql
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 1
Server version: 5.1.26-rc-debug Source distribution

Type 'help;' or '\\h' for help. Type '\\c' to clear the buffer.

mysql> SHOW PLUGINS;
+------------+--------+----------------+-------------+---------+
| Name       | Status | Type           | Library     | License |
+------------+--------+----------------+-------------+---------+
| binlog     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| partition  | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| ARCHIVE    | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| BLACKHOLE  | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| CSV        | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MEMORY     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| InnoDB     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MyISAM     | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL        | GPL     |
| Java       | ACTIVE | PSM LANGUAGE   | psm_java.so | GPL     |
+------------+--------+----------------+-------------+---------+
10 rows in set (0.00 sec)

You should see the Java plugin as in the output above, if not, recheck your classpath or try the INSTALL PLUGIN line from above again. The server will automatically load the plugin when it starts, there’s no need to issue the INSTALL PLUGIN command again.

Now let’s call our first UDF in Java. We’ll simply take the examples that were packed up for us in java_udf_example.jar. To do that, connect to the database as user root (unless you want to give the anonymous account EXECUTE privileges) and do the following:

$ ./bin/mysql -uroot
mysql> use test
Database changed
mysql> CREATE FUNCTION my_reverse(string VARCHAR(255))
    -> RETURNS VARCHAR(255)
    -> NO SQL LANGUAGE Java
    -> EXTERNAL NAME 'java_udf_example.jar:org.example.mysql.udf.Reverse.go';
Query OK, 0 rows affected (0.00 sec)
mysql> select my_reverse("looc era sFDUavaJ");
+---------------------------------+
| my_reverse("looc era sFDUavaJ") |
+---------------------------------+
| JavaUDFs are cool               | 
+---------------------------------+
1 row in set (0.01 sec)

Congratulations! You have called your first Java method from inside of MySQL!

This MySQL branch has a slightly different syntax of the CREATE FUNCTION and CREATE PROCEDURE commands, in that it has been extended to allow different language names and can associate an external name with the in-database routine. The actual format of the external name is dependent on the language plugin, of course. For Java we can give it the .jar file and the package (all the way down to the method) of the code we want to call. Other than that it’s the same old Stored Routine syntax.

For reference, here’s the code we have just called (from plugin/java_udf/org/example/mysql/udf/Reverse.java):

package org.example.mysql.udf;

public class Reverse {
    public String go(String str) {
        if (str == null) {
            return null;
        }
        StringBuffer sb = new StringBuffer(str.length());
        for (int i = str.length() - 1; i >= 0; i--) {
            sb.append(str.charAt(i));
        }
        return sb.toString();
    }
}

I love the fact that absolutely nothing in the above Java code refers to JDBC or MySQL at all. It could be anything that takes a string and returns a string, which is a whole lot :)

Now obviously we have to be a bit careful about the code we execute as part of our Stored Routine, because it will run inside the mysqld process. Avoid costly (both in terms of CPU and memory usage) code, don’t block etc. Common sense applies.

In the next post, I will show you how to put it all together and use Lucene to transparently index data flowing into your tables. You will even be able to search that fulltext index from within MySQL :)

CU next time.

About

Kay Roepke

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today