Tuesday Aug 05, 2008

MySQL Cluster basics: Why using a primary key is even more important

While we were mostly excited to become part of Sun, there was at least one drawback too. MySQL had a worldwide policy of "Scandinavian style" 5 week vacations per year (4 Summer, 1 Winter week). Sun doesn't have such a worldwide policy, so most of us lost some vacation days there. But there is one group of employees that still enjoy this benefit... those of us that actually live in Scandinavia!

Those 4 weeks are however now gone for my part :-( So I thought I'll make a soft start to working by a) yesterday doing almost nothing and b) today blogging. A long long time ago before I left for my vacation Mark Callaghan requested more blog posts about MySQL Cluster, so I thought why not start with one right now. As previously noted there are many good MySQL Cluster bloggers out there, but admittedly some of their blog postings can be very deeply technical. With this one however, I just want to highlight some basic aspect of how MySQL Cluster works and in fact how it's performance differs from InnoDB and MyISAM based databases. (But it must be added, if you want "getting started" level material, then Johan Andersson's blog with posts like A great config.ini and The RTFM is a must read.)

Some time in June - this was before my vacation, a long time ago I almost cannot remember - I was getting a MySQL Cluster setup up to speed with a customer. Before getting to the real data, the customer had decided to "benchmark" his setup with simply inserting 100 000 integers in a batch:

CREATE TABLE t (i INT) engine=ndbcluster;
... (100000 times)

This test came out with MySQL Cluster looking very bad, more than 2 minutes. Same on InnoDB took 3.7s. (The InnoDB result also looked unrealistic first, on a standard PC you wouldn't get anywhere near that fast, but since those RAID controllers have a DDR RAM chip as write cache, it basically made this an in-memory operation of size 400kB or so.)

So what to do?

MySQL Cluster lesson 1: Batching, batching!

The major cause for bad performance in MySQL Cluster, is often the fact that cluster nodes are connected to each other over a network. This introduces latency and overhead in every single transaction you do - typically at least 3 nodes need to communicate with each other, send the data, agree that it can be committed, etc.... The way to minimize this problem is batching: If you send more data in one batch, the effect of tcp/ip overhead is smaller.

In this particular example of course the effect of tcp/ip overhead was at worst: We were sending only one INT at a time! So we turned on transaction_allow_batching and turned off ndb_force_send. Also remember to turn off AUTOCOMMIT, so there actually is a transaction to be batched!

MySQL Cluster lesson 2: Parallelize, parallelize!

The great thing with MySQL Cluster is it's write scalability, which (for "suitable applications") is practically infinite. With normal MySQL you can easily get read scalability with things like replication, but write scalability is a tougher nut to crack. In MySQL Cluster, you add more data nodes and you get more write throughput. The partitioning is internal to the cluster, so you don't have to mess with shards or anything in your application, one mysql client connection is all you need. Live installations of MySQL Cluster (such as HLR, the heart of a mobile phone network) will do over 100k writes per second.

But to actually get there, you need to also parallelize your operations. Inserting 100k INTEGERs in one batch should be fast, but separating it into 100 batches should be lightning fast, since they can all happen at the same time and MySQL Cluster can absorb that load.

So with these changes... We got from 2 minutes to 50 seconds. Still not 1 second like our marketing material claims!

MySQL Cluster lesson 3: Why PRIMARY KEY is important

With all of my own tricks used, I actually had to turn to my more senior collagues for advice what to do next. One reply was the jackpot:

The thing is, this simple example didn't have a primary key defined. In real life situations you of course always have the primary key, and that is also the reason I had forgotten this important piece of advice: With MySQL Cluster you must always always always remember to use it even for silly examples like this. The reason is that MySQL Cluster (by default) uses the primary key for its internal partitioning between data nodes. So if there is no primary key, it will automatically add a hidden AUTOINCREMENT column to act as one. This leads to 2 problems:

1) Since you are not using an explicit primary key, the mysqld node will end up randomly connecting to whatever ndbd node it happens to choose, from which the transaction is re-routed to the right data nodes. This unnecessary adds (doubles!) network latency.

2) When inserting, fetching the unique autoincrement value acts like a global lock for all inserts. You can parallelize all you want, but it doesn't get any faster. (There is ndb_autoincrement_prefetch_sz to prefetch a number of keys in a batch, but still, always always use primary key.)

In this example, parallellizing actually made performance much worse from the 50 seconds we were at, since the multiple threads were now all competing for the autoincrement lock.

Adding an explicit primary key to the table and the inserts, solved the problem and got the time down to 10 seconds and with some further tuning to 5 seconds. This is still not lightning fast, but at this point the customer was happy and chose to focus on their real application instead. (After the tuning leading to 5 seconds, next significant improvements would probably have been gained from omitting SQL and using the native C++ or Java NDB API and switching from Ethernet to the proprietary Dolphin SCI interconnects. Both of these further remove latency.)

So to summarize: Batch, parallelize and always always define the primary key.

While this has been a trivial example of something not relevant for the real world, I hope it has helped highlight some behavioural charasteristics of the MySQL NDB Cluster.

PS. In June there was still a lot of confusion around the new version numbering with MySQL NDB Cluster. Are you all clear on that now, or do we need to go through it once more in another blog post?

Monday May 19, 2008

Introducing MySQL's telco endeavours

We (in the Telecom team at MySQL) have been debating whether we should call this blog 'MySQL in Telco' or 'MySQL in Communications'. Naming discussions tend to take long time, and this one was no exception.

From a US perspective, it appears that Telecom is an outdated term. Wireless carriers and cable television companies do not consider themselves as Telecom companies. Maybe for this reason, large US-headquartered vendors including Sun, HP, IBM and Oracle all have a 'Communications & Media' practice.

From a European perspective, Telecom is used for equipment vendors and service providers. Companies like Logica, Cap Gemini, Atos Origin and TietoEnator refer to the vertical as 'Telecom & Media'.

Of the global SI's in India, Wipro and TCS refer to Telecom while Infosys talk about Communications.

As a working title we at one time used 'MySQL blablabla' blog, and funnily enough, there was a compromise suggestion to really use that for the name, since 'blablabla' is a term that incorporates something fundamental about 'Communication'.

So, after an interesting debate, we finally concluded that maybe it does not matter, and settled on 'MySQL in Communications'. However, telco and communications can be used interchangeably.

Coming back to the real business, MySQL has been making some solid progress in the telco space over the past few years. It started with the acquisition of the NDB Cluster database from Ericsson in September 2003. What a culture clash! Open source vs. proprietary software development. Having 2 masters to please, customers AND community. Rather than taking several hours, installation should take 15 minutes in order to satisfy impatient community users. Early releases would be made available to everybody, rather than a few selected friendly customers.

The integration took longer than what everybody had expected. It took over a year to have a first release of MySQL Cluster. Not the best release, but the technology was complex. Shared nothing database clustering! How do you make 10 machines act as one database cluster, accessed as one single database? How do you handle failures? How do you reintroduce a database node into a cluster running non-stop 20,000 tps and maintain data integrity? How do you provide mainframe reliability on commodity hardware? Quite a few challenges to address, but it did make a lot of sense to address these. The world is now building infrastructure using scale-out architectures to address massive online communities of millions of users, while the majority of database products on the market were designed in the 70's or 80's for more modest use cases.

Today, MySQL has a pretty healthy business in the area of subscriber data management for telecom networks, and global telecom vendors as well as small telecom startups build carrier grade, database driven infrastructure on top of the carrier product line.


In this blog we (the MySQL Telecom team) intend to write about some of the database trends that we see in the converging telecom and internet markets, and what we are doing in that area.



The people of the MySQL Telecom team writes about developments around MySQL and MySQL Cluster and how these products are used by our Communitcations industry customers. (Image jasmic@Flickr)


« November 2015