MySQL Cluster Uses
By LinuxJedi on Feb 05, 2010
MySQL Cluster can be used as a general purpose transactional storage engine, but if you convert all your InnoDB tables to it and connect your application straight to it you may not see the performance you were hoping for. This is because MySQL Cluster was originally designed for real-time telecommunications applications (such as RADIUS servers). It has slowly been modified to become more general purpose and improvements are being made every day but there are still some performance considerations which go with this. In some cases tweaking your schema and/or queries can help performance dramatically, so I shall try and outline some of things to watch for here.
The fastest type of lookup you can do in Cluster is a primary key equality lookup (ie. SELECT \* FROM table WHERE pkey = 2). This is because the primary key is stored as a hash index as well as an optional ordered index. This hash index is used to partition the data between the data nodes. MySQL is smart enough to process the hash and go directly to the data node with the data.
When running a query which uses an ordered index (or unique hash index) the query is sent to the Transaction Coordinator in one node which then asks the Local Query Handlers in one or more nodes to return rows that match this query. The Transaction Coordinator is aware of the indexes so knows which nodes to ask to process the query. This in general is slightly slower and can actually perform worse for ordered indexes as more data nodes are added because more Local Query Handlers need to be contacted.
Finally if the query is a table scan the Transaction Coordinator must ask all the Local Query Handlers to search for the data. This is much slower.
Joins in MySQL Cluster do not currently perform well. Internally the second table must be queried to match every row returned by the first table, this can mean a lot of network traffic which can slow things down. There is work in progress to improve this by pushing the join condition down into the data nodes, this will give a massive performance increase when using joins. More information about this (called SPJ) can be seen in Jonas' blog.
BLOBs and TEXT
As I have already mentioned in this blog BLOBs (and TEXT) columns require a separate table to hold most of the BLOB data. This can cause performance problems and locking issues so if possible VARCHAR or VARBINARY should be used instead.
MySQL Cluster currently only supports READ COMMITTED transaction isolation level so if you are used to REPEATABLE READ that you typically get from engines such as InnoDB you may want to check that this will not cause problems with your application.
In general we recommend transactions should be short, limited to just a few queries. MySQL Cluster can give temporary errors for a number of reasons, these do not happen often but will mean that the entire transaction will need to be started again. For many applications it will be very hard to replay a transaction which has been running for a whole hour. Also committing very large transactions can lead to GCP Stop.
Data nodes must be on the same private network, period. Preferably on the same switch. The protocol all the nodes use is insecure so must not be on a public network (with the exception of the mysqld servers). Also latency can be a problem with data nodes, if there is a short period of large latency or latency in general is increased due to the data nodes being in physically different locations you will have performance problems and worse regular node failures are to be expected. Finally if the network has other traffic running on it this can also cause performance problems.