Blobs in MySQL Cluster

If there is one thing that confuses people about tables in MySQL Cluster (including me at times) it is BLOB/TEXT columns.  When NDB was originally created it was not designed to handle BLOB data, so the handling of BLOB data was difficult to implement and is sometimes not exactly what users expect.

How MySQL Cluster BLOBs work

When you create a table in MySQL Cluster which has a BLOB column the first 256 bytes of the BLOB is stored in the main table (and in memory when using disk data tables), subsequent data is then stored in a hidden table (typically split into 2KB rows).  This means there is an extra table for every BLOB or TEXT column in your main table (and extra resource usage).

BLOB locking in MySQL Cluster

These extra tables can cause some problems, firstly with performance (retrieving BLOB data is not very fast) and more importantly with locking.  MySQL Cluster works in READ-COMMITTED transaction isolation level, but this makes things difficult when handling BLOB data to keep the tables in-sync and consistent.  When selecting a row which has a BLOB MySQL Cluster needs to gain a shared lock on that row, and when updating that row an exclusive lock is needed.

This can be a problem if, for example, you update a row in one transaction and select it at another at the same time.  The select will wait for the update to complete because it cannot obtain a shared lock until the exclusive lock is cleared.  This then can lead to the temporary error 'Time-out in NDB, probably caused by deadlock'.

Finally there are certain settings that may need to be increased to handle the large amount of data in a big BLOB, most notably SendBufferMemory and in the case of ndbmtd LongMessageBuffer.

The moral of this story?

1. If you can use VARCHAR/VARBINARY instead, this will avoid these problems
2. Be very careful about writing your applications with BLOBs, if they are large it may well be better to keep them stored separately on a SAN and have your application retrieve them
3. Keep your transactions short so locking time is kept to a minimum

Comments:

Hey there,

Great blog, and very informative.

We're running a live system with about 6 million records which are inserted, selected and updated regularly... these were originally set-up under MyISAM with a TEXT field because at the time VARCHAR(255) was the limit.

Under NDB it is working well, but I think we'd get an improvement by changing the TEXT field to a VARCHAR(2048). Do you think that would be of benefit? Also what's the safest way to alter the table? Would an ALTER TABLE work whilst the cluster is online (and would it require double the RAM to create a copy)?

Posted by AndyC on August 10, 2010 at 06:32 PM BST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

LinuxJedi is an ex-MySQL Senior Technical Support Engineer who previously worked at Oracle and specialised in MySQL Cluster as well C/C++ APIs.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today