MySQL Cluster Uses

MySQL Cluster can be used as a general purpose transactional storage engine, but if you convert all your InnoDB tables to it and connect your application straight to it you may not see the performance you were hoping for.  This is because MySQL Cluster was originally designed for real-time telecommunications applications (such as RADIUS servers).  It has slowly been modified to become more general purpose and improvements are being made every day but there are still some performance considerations which go with this.  In some cases tweaking your schema and/or queries can help performance dramatically, so I shall try and outline some of things to watch for here.

Indexes

The fastest type of lookup you can do in Cluster is a primary key equality lookup (ie. SELECT \* FROM table WHERE pkey = 2).  This is because the primary key is stored as a hash index as well as an optional ordered index.  This hash index is used to partition the data between the data nodes.  MySQL is smart enough to process the hash and go directly to the data node with the data.

When running a query which uses an ordered index (or unique hash index) the query is sent to the Transaction Coordinator in one node which then asks the Local Query Handlers in one or more nodes to return rows that match this query.  The Transaction Coordinator is aware of the indexes so knows which nodes to ask to process the query.  This in general is slightly slower and can actually perform worse for ordered indexes as more data nodes are added because more Local Query Handlers need to be contacted.

Finally if the query is a table scan the Transaction Coordinator must ask all the Local Query Handlers to search for the data.  This is much slower.

Joins

Joins in MySQL Cluster do not currently perform well.  Internally the second table must be queried to match every row returned by the first table, this can mean a lot of network traffic which can slow things down.  There is work in progress to improve this by pushing the join condition down into the data nodes, this will give a massive performance increase when using joins.  More information about this (called SPJ) can be seen in Jonas' blog.

BLOBs and TEXT

As I have already mentioned in this blog BLOBs (and TEXT) columns require a separate table to hold most of the BLOB data.  This can cause performance problems and locking issues so if possible VARCHAR or VARBINARY should be used instead.

Transactions

MySQL Cluster currently only supports READ COMMITTED transaction isolation level so if you are used to REPEATABLE READ that you typically get from engines such as InnoDB you may want to check that this will not cause problems with your application.

In general we recommend transactions should be short, limited to just a few queries.  MySQL Cluster can give temporary errors for a number of reasons, these do not happen often but will mean that the entire transaction will need to be started again.  For many applications it will be very hard to replay a transaction which has been running for a whole hour.  Also committing very large transactions can lead to GCP Stop.

Networking

Data nodes must be on the same private network, period.  Preferably on the same switch.  The protocol all the nodes use is insecure so must not be on a public network (with the exception of the mysqld servers).  Also latency can be a problem with data nodes, if there is a short period of large latency or latency in general is increased due to the data nodes being in physically different locations you will have performance problems and worse regular node failures are to be expected.  Finally if the network has other traffic running on it this can also cause performance problems.

Comments:

[Trackback] This post was mentioned on Twitter by LinuxJedi: My new MySQL Cluster post on how your application should use it: http://blogs.sun.com/LinuxJedi/entry/mysql_cluster_uses

Posted by uberVU - social comments on February 05, 2010 at 11:30 AM GMT #

"This is because the primary key is stored as a hash index as well as an ordered index."

Forgive me for being pedantic, but wouldn't it be more accurate to write:

"This is because the primary key is stored as a hash index as well as an \*optional\* ordered index."

Posted by Rob Wultsch on February 05, 2010 at 06:21 PM GMT #

Hi Rob,

You are correct, many thanks for pointing out my error. I shall fix it shortly.

Posted by LinuxJedi on February 06, 2010 at 12:24 AM GMT #

In the FAQ of mysql cluster they say :

[MySQL Cluster is intended to be used in a high-bandwidth environment, with computers connecting via TCP/IP. Its performance depends directly upon the connection speed between the cluster's computers...]

When you say "Data nodes must be on the same private network, period." you mean that is mandatory? Like the nodes will not connect to the management server? We are trying to connect 2 nodes from different locations with different real ip`s but it doesn`t work. We haven`t seen this requirement in the docs.

We are trying to make this work for 1 week, read all documentations, tryed centos and ubuntu distro.

Our management server show this:
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 (not connected, accepting connect from 195.66.x.122)
id=3 @193.230.x.214 (mysql-5.1.39 ndb-7.0.9, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @195.66.x.36 (mysql-5.1.39 ndb-7.0.9)

[mysqld(API)] 2 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)

When we check the status of the node 3 it is in phase 0 of starting.

Do you think a VPN connection will help?

Thanks.

Posted by Dragos on June 23, 2010 at 09:53 AM BST #

Hi Dragos,

It will work with remote connections between data nodes. But you will find node failures happen very often, mostly due to heartbeat failures and performance will be really bad (and it is unsupported).

To make your configuration work the servers will need full access to each other to open ports up (so in your case, yes, a VPN).

Node 3 won't get out of phase 0 until Node 2 is connected.

Posted by LinuxJedi on June 23, 2010 at 10:06 AM BST #

Thank you very much.

Posted by Dragos on June 24, 2010 at 02:47 AM BST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

LinuxJedi is an ex-MySQL Senior Technical Support Engineer who previously worked at Oracle and specialised in MySQL Cluster as well C/C++ APIs.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today