MySQL perspectives from a SQL Server guru
By Henrik Ingo on Aug 13, 2008
Ben Kaufman at SQLServerCentral introduces MySQL to the SQL Server DBA crowd. All in all his views seem to be fairly positive, in particular the MySQL Cluster experience:
NDB is the gem of MySQL, originally developed by Ericson to track cell phone calls this is a share nothing cluster engine stored in memory. This is a true cluster that supports both high availability and load balancing. [...]
This engine is similar to synchronous mirroring in SQL Server in that it is a 2-phase commit, the difference being the commit is done in memory at the data layer not the log. Logs are hardened at a later time, with the theory being that since data is committed on multiple nodes the data is safe and doesn't require a log flush as part of the transaction. [...]
For pure performance the cluster is comparable with a single instance of SQL Server. I've found on selects it beats SQL Server slightly as long as the data on SQL Server is in memory. This gap widens as more connections are made. Writes depend on whether SQL Server is using write back cache on its controller, in the case it is, it beats NDB, due to NDBs 2-phase commit. Without the controller cache NDB beat SQL. However this is not apples to apples. When compared to SQL Server synchronous mirroring NDB wins hands down. The cost associated with NDB is that it resides in memory (5.1 allows you to keep non indexed data on disk), and therefore your dataset is limited by the amount of memory you use. [...]
With all the negatives put aside if you have an application that requires redundancy, and fast inserts and selects on a single table, NDB is the best product out there. We've been running it for almost 18 months and it's been rock solid. Compared with other SQL Server and other MySQL implementations this has required the least amount of attention. One final note this will not run on Windows.
Also other storage engines get a good evaluation:
Myisam whether intentional or not is built and optimized for read-only datasets. Some of the features that make this the case is the fact that it doesn't support transactions, so taking user updates would be dangerous, but you don't incur the overhead of the transactional log. It performs locking at the table level not the row or page, which is not the best for active OLTP systems, unless the system is only doing inserts. On an application only performing inserts it performs well because it has the ability to perform concurrent inserts and selects. This is because data is appended to the end of the file, and a table lock is not issued, allowing for selects of the data not to be blocked.
Wow, even I had missed this fact about MyISAM. There's always so much talk about MyISAM performing badly on "concurrent writes", I didn't realize that pure INSERTs are not a problem only UPDATEs. I immediately tested this and on a 2 core system (my laptop) the performance of bulk insert doubled when using 10 threads instead of 1.
Thanks Ben, this is useful info. There are many networking use cases where you have to collect lots of data from network elements in some kind of aggregator or mediator, and there the data is then queried, forwarded and deleted later on, but the data is never UPDATEd. It seems MyISAM is a good fit for this kind of use case after all, not just pure datawarehousing. (Especially with PARTITIONing, where you can delete full tables/partitions and not just a bunch of rows.)