By Mat Keep on Mar 30, 2012
Yesterday we ran a webinar discussing the demands of next generation web services and how blending the best of relational and NoSQL technologies enables developers and architects to deliver the agility, performance and availability needed to be successful.
Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into areas like auto-sharding and cross-shard JOINs, replication, performance, client libraries, etc. So I thought it would be useful to post those below, for the benefit of those unable to attend the webinar.
Before getting to the Q&A, there are a couple of other resources that maybe useful to those looking at NoSQL capabilities within MySQL:
- MySQL Cluster demo, including NoSQL interfaces, auto-sharing, high availability, etc.
So here is the Q&A from the event
Q. Where does MySQL Cluster fit in to the CAP theorem?
A. MySQL Cluster is flexible. A single Cluster will prefer consistency over availability in the presence of network partitions. A pair of Clusters can be configured to prefer availability over consistency. A full explanation can be found on the MySQL Cluster & CAP Theorem blog post.
Q. Can you configure the number of replicas? (the slide used a replication factor of 1)
Yes. A cluster is configured by an .ini file. The option NoOfReplicas sets the number of originals and replicas: 1 = no data redundancy, 2 = one copy etc. Usually there's no benefit in setting it >2.
Q. Interestingly most (if not all) of the NoSQL databases recommend having 3 copies of data (the replication factor).
Yes, with configurable quorum based Reads and writes. MySQL Cluster does not need a quorum of replicas online to provide service. Systems that require a quorum need > 2 replicas to be able to tolerate a single failure. Additionally, many NoSQL systems take liberal inspiration from the original GFS paper which described a 3 replica configuration. MySQL Cluster avoids the need for a quorum by using a lightweight arbitrator. You can configure more than 2 replicas, but this is a tradeoff between incrementally improved availability, and linearly increased cost.
Q. Can you have cross node group JOINS? Wouldn't that run into the risk of flooding the network?
MySQL Cluster 7.2 supports cross nodegroup joins. A full cross-join can require a large amount of data transfer, which may bottleneck on network bandwidth. However, for more selective joins, typically seen with OLTP and light analytic applications, cross node-group joins give a great performance boost and network bandwidth saving over having the MySQL Server perform the join.
Q. Are the details of the benchmark available anywhere? According to my calculations it results in approx. 350k ops/sec per processor which is the largest number I've seen lately
The details are linked from Mikael Ronstrom's blog
The benchmark uses a benchmarking tool we call flexAsynch which runs parallel asynchronous transactions. It involved 100 byte reads, of 25 columns each. Regarding the per-processor ops/s, MySQL Cluster is particularly efficient in terms of throughput/node. It uses lock-free minimal copy message passing internally, and maximizes ID cache reuse. Note also that these are in-memory tables, there is no need to read anything from disk.
Q. Is access control (like table) planned to be supported for NoSQL access mode?
Currently we have not seen much need for full SQL-like access control (which has always been overkill for web apps and telco apps). So we have no plans, though especially with memcached it is certainly possible to turn-on connection-level access control. But specifically table level controls are not planned.
Q. How is the performance of memcached APi with MySQL against memcached+MySQL or any other Object Cache like Ecache with MySQL DB?
With the memcache API we generally see a memcached response in less than 1 ms. and a small cluster with one memcached server can handle tens of thousands of operations per second.
Q. Can .NET can access MemcachedAPI?
Yes, just use a .Net memcache client such as the enyim or BeIT memcache libraries.
Q. Is the row level locking applicable when you update a column through memcached API?
An update that comes through memcached uses a row lock and then releases it immediately. Memcached operations like "INCREMENT" are actually pushed down to the data nodes. In most cases the locks are not even held long enough for a network round trip.
Q. Has anyone published an example using something like PHP? I am assuming that you just use the PHP memcached extension to hook into the memcached API. Is that correct?
Not that I'm aware of but absolutely you can use it with php or any of the other drivers
Q. For beginner we need more examples.
Take a look here for a fully worked example
Q. Can I access MySQL using Cobol (Open Cobol) or C and if so where can I find the coding libraries etc?
A. There is a cobol implementation that works well with MySQL, but I do not think it is Open Cobol. Also there is a MySQL C client library that is a standard part of every mysql distribution
Q. Is there a place to go to find help when testing and/implementing the NoSQL access?
If using Cluster then you can use the firstname.lastname@example.org alias or post on the MySQL Cluster forum
Q. Are there any white papers on this?
Yes - there is more detail in the MySQL Guide to NoSQL whitepaper
If you have further questions, please don’t hesitate to use the comments below!