Monday Oct 14, 2013

Join Those Adopting the MySQL 5.6 Database

MySQL Server 5.6 has seen fast adoption - the list of performance improvements and features make a convincing argument all on their own.

Join those adopting the MySQL database by taking the MySQL for Database Administrators training course.

This 5-day course teaches you all the core dba skills including what MySQL Server 5.6 brings you in terms of Replication. You will learn about:

  • Managing the MySQL Binary Log
  • MySQL replication threads and files
  • Using MySQL Utilities for Replication
  • Designing Complex Replication Topologies
  • Multi-Master and Circular Replication
  • Performing a Controlled Switchover
  • Monitoring and Troubleshooting MySQL Replication
  • Replication with Global Transaction Identifiers (GTIDs)

You can take this 5-day live instructor-led course as a:

  • Live-virtual event: Take this course from your own desk, no travel required. Choose from a selection of events to suit different timezones.
  • In-class event: Travel to an education center to take this course. Below is a selection of events already on the schedule.

 Location

 Date

 Delivery Language

 Brussels, Belgium

 2 December 2013

 English

 London, England

 28 October 2013

 English

 Aix-en-Provence, France

2 December 2013 

 French 

 Bordeaux, France

2 December 2013 

 French

 Nice, France

 4 November 2013

 French

Puteaux, France 

14 October 2013 

 French

Budapest, Hungary 

 4 November 2013

 Hungarian

Belfast, Ireland 

 16 December 2013

 English

 Milan, Italy

 2 December 2013

  Italian 

 Rome, Italy

 18 November 2013

 Italian

 Utrecht, Netherlands

 2 December 2013

Dutch 

 Gdynia, Poland

 21 October 2013

Polish 

Istanbul, Turkey 

 25 November 2013

 Turkish

 Bangalore, India

 18 November 2013

 English

 Jakarta Barat, Indonesia

 16 December 2013

  English 

 Makati City, Philippines

 4 November 2013

 English

 Pasig City, Philippines

 4 November 2013

 English

 Casablanca, Morocco

 25 November 2013

 English

 Johannesburg, South Africa

 9 December 2013

 English

 Belmont, CA, United States

14 October 2013 

 English

San Pedro Garza Garcia, Mexico 

9 December 2013 

 Spanish

 Lima, Peru

 14 October 2013

 Spanish

Prove your MySQL DBA skills by taking the Oracle Certified Professional, MySQL 5.6 Database Administrator exam. And until Dec 14th, you can take this exam at a must reduced price, for example only $50 for exams located in the United States.

For more information about the authentic MySQL curriculum, go to http://oracle.com/education/mysql.

Thursday May 23, 2013

MySQL at DrupalCamp, Goteborg - Sweden

DrupalCamp Conference is hold in Göteborg, Sweden on May 25, 2013.

MySQL is part of this show and I would like to invite you to our session on "Scalability and Availability with MySQL Replication" given by Sven Sandberg on Saturday-May 25, 2013 @ 13:45-14:25!

So, if you are close to Göteborg or attending this event, do not forget to come to listen Sven's talk!


Wednesday Apr 10, 2013

MySQL Replication: Self-Healing Recovery with GTIDs and MySQL Utilities

MySQL 5.6 includes a host of enhancements to replication, enabling DevOps teams to reliably scale-out their MySQL infrastructure across commodity hardware, on-premise or in the cloud.

One of the most significant enhancements is the introduction of Global Transaction Identifiers (GTIDs) where the primary development motivation was:

- enabling seamless failover or switchover from a replication master to slave

- promoting that slave to the new master

- without manual intervention and with minimal service disruption.

You can download the new MySQL Replication High Availability Guide to learn more.  The following sections provide an overview of how GTIDs and new MySQL utilities work together to enable self-healing replication clusters.  

GTIDs

To understand the implementation and capabilities of GTIDs, we will use an example illustrated below:

- Server “A” crashes

- We need to failover to one of the slaves, promoting it to master

- The remaining server becomes a slave of that new master

Figure 1: MySQL Master Failover and Slave Promotion

As MySQL replication is asynchronous by default, servers B and C may not have both replicated and executed the same number of transactions, i.e. one may be ahead of the other. Consider the following scenarios:

Scenario #1

- Server B is ahead of C and B is chosen as the new master;

- [Then] Server C needs to start to replicate from the first transaction in server B that it is yet to receive;

Scenario #2

- Server C has executed transactions that have so far not been received by Server B, but the administrator designates Server B as the new master (for example, it is configured with higher performing hardware than Server C).

- Server B therefore needs to execute later transactions from Server C, before assuming the role of the master, otherwise lost transactions and conflicts can ensue.

GTIDs apply a unique identifier to each transaction, so it is becomes easy to track when it is executed on each slave. When the master commits a transaction, a GTID is generated which is written to the binary log prior to the transaction. The GTID and the transaction are replicated to the slave.

If the slave is configured to write changes to its own binary log, the slave ensures that the GTID and transaction are preserved and written after the transaction has been committed.

The set of GTIDs executed on each slave is exposed to the user in a new, read-only, global server variable, gtid_executed. The variable can be used in conjunction with GTID_SUBTRACT() to determine if a slave is up to date with a master, and if not, which transactions are missing.

A new replication protocol was created to make this process automatic. When a slave connects to the master, the new protocol ensures it sends the range of GTIDs that the slave has executed and committed and requests any missing transactions. The master then sends all other transactions, i.e. those that the slave has not yet executed. This is illustrated in the following example (note that the GTID numbering and binlog format is simplified for clarity):

Figure 2: Automatically Synchronizing a New Master with it's Slaves

In this example, Server B has executed all transactions before Server A crashed. Using the MySQL replication protocol, Server C will send “id1” to Server B, and then B will send “id 2” and “id3” to Server C, before then replicating new transactions as they are committed. If the roles were reversed and Server C is ahead of Server B, this same process also ensures that Server B receives any transactions it has so far not executed, before being promoted to the role of the new master.

We therefore have a foundation for reliable slave promotion, ensuring that any transactions executed on a slave are not lost in the event of an outage to the master.

MySQL Utilities for GTIDs

Implemented as Python-based command-line scripts, the Replication utilities are available as a standalone download as well as packaged with MySQL Workbench. The source code is available from the MySQL Utilities Launchpad page.

The utilities supporting failover and recovery are components of a broader suite of MySQL utilities that simplify the maintenance and administration of MySQL servers, including the provisioning and verification of replication, comparing and cloning databases, diagnostics, etc. The utilities are available under the GPLv2 license, and are extendable using a supplied library. They are designed to work with Python 2.6 and above.

Replication Utility: mysqlfailover

While providing continuous monitoring of the replication topology, mysqlfailover enables automatic or manual failover to a slave in the event of an outage to the master. Its default behavior is to promote the most viable slave, as defined by the following slave election criteria.

• The slave is running and reachable;

• GTIDs are enabled;

• The slaves replication filters do not conflict;

• The replication user exists;

• Binary logging is enabled.

Once a viable slave is elected (called the candidate), the process to retrieve all transactions active in the replication cluster is initiated. This is done by connecting the candidate slave to all of the remaining slaves thereby gathering and executing all transactions in the cluster on the candidate slave. This ensures no replicated transactions are lost, even if the candidate is not the most current slave when failover is initiated.

The election process is configurable. The administrator can use a list to nominate a specific set of candidate slaves to become the new master (e.g. because they have better performing hardware).

At pre-configured intervals, the utility will check to see if the server is alive via a ping operation, followed by a check of the connector to detect if the server is still reachable. If the master is found to be offline or unreachable, the utility will execute one of the following actions based on the value of the failover mode option, which enables the user to define failover policies:

• The auto mode tells the utility to failover to the list of specified candidates first and if none are viable, search the list of remaining slaves for a candidate.

• The elect mode limits election to the candidate slave list and if none are viable, automatic failover is aborted.

• The fail mode tells the utility to not perform failover and instead stop execution, awaiting further manual recovery actions from the DevOps team.

Via a series of Extension Points users can also bind in their own scripts to trigger failover actions beyond the database (such as Virtual IP failover). Review the mysqlfailover documentation for more detail on configuration and options of this utility.

Replication Utility: mysqlrpladmin

If a user needs to take a master offline for scheduled maintenance, mysqlrpladmin can perform a switchover to a specific slave (called the new master). When performing a switchover, the original master is locked and all slaves are allowed to catch up. Once the slaves have read all events from the original master, the original master is shutdown and control is switched to the new master.

There are many Operations teams that prefer to take failover decisions themselves, and so mysqlrpladmin provides a mechanism for manual failover after an outage to the master has been identified, either by using the health reporting provided by the utilities, or by alerting provided by a tool such as the MySQL Enterprise Monitor.

It is also possible to easily re-introduce recovered masters back to the replication topology by using the demote-master command and initiating a switchover to the recovered master.

Review the mysqlrpladmin documentation for more detail on configuration and options of the utility.

Resources to Get Started

In addition to the documentation, two additional resources are useful to enable you to learn more:

- MySQL replication utilities tutorial video

- MySQL replication guide: building a self healing replication topology

Summary

The combination of options to control failover or switchover and the ability to inform applications of the failover event are powerful features that enable self-healing for critical applications.

Thursday Apr 04, 2013

MySQL 5.6 Replication: All That Is New, On-Demand

The new MySQL 5.6 GA release delivers a host of new capabilities to support developers releasing new services faster, with more agility, performance and security .

One of the areas with the most far-reaching set of enhancements is MySQL replication used by the largest web, mobile and social properties to horizontally scale highly-available MySQL databases across distributed clusters of low cost, commodity servers.

A new on-demand MySQL 5.6 replication webinar takes you on a guided tour through all of those enhancements, including:

- 5x higher master and slave performance with Binary Log Group Commit, Multi-Threaded Slaves and Optimized Row-based replication

- Self healing replication with Global Transaction Identifiers (GTIDs), utilities and crash-safe masters and slaves

- Replication event checksums for rock-solid data integrity across a replication cluster

- DevOps features with Time-Delayed Replication, Remote binary log backup and new CLI utilities

We had some great questions during the live session - here are a few:

Question: When multi-threaded slave is configured, how do you handle multi-database DML? 

Answer: When that happens the slave switches to single-threaded mode for that transaction. After it is applied, the slave switches back to multi-threaded mode.

Question: We have seen in past that replication would sometime break when complex queries get fired or alter statements take place. Neither row based or Statement based replication guaranteed 100% fail safe replication. Does MySQL5.6 address this issue ?

Answer: Replication can fail for several specific reasons. There is no general algorithm enabling "100% fail safe" replication. But we made sure that all that are unsafe are either converted to row based replication or issue warnings.

Question: We are looking for a HA cluster solution where most of the websites are Wordpress based. Do you have any recommendations on what kind of replication solution should we choose? The tables are using innodb

Answer: Using MySQL 5.6 with the master/slave replication we are discussing today, configured with GTIDs and using the mysqlfailover utility will provide you High Availability for your Wordpress cluster and InnoDB tables. Nothing you need to add - it is all part of MySQL you download today

Question: I have question about data loss and translating my-bin.000132 to my-bin.00101 when a master fails?

Answer: That's exactly the problem GTIDs address. Without GITDs it's hard to know the correct position on my-bin.000101 that corresponds to my-bin.000132.

With GTIDs we don't need to know positions, only what the slave has already applied, knowing that it's simple to send to the new master any missing transactions that are on other slaves. The data that was already on my-bin.00132 can be ensured that it was sent to a slave using Semi-syncronous replication.

Question: How do the new utilities work with a Pacemaker/Corosync setup?

Answer: The utilities integrate with Pacemaker/Corosync via extension points. The MySQL Pacemaker agent can call rpladmin to perform a switchover.

Question: Where can I learn more about supported HA solutions for MySQL?

Answer: From the Guide to MySQL HA solutions. I hope you find this useful

Question: Are there any plans to move replication from host-based to either 'database' or 'table' based?

Answer: Current MySQL replication supports replication of specific databases or tables from a Host using Replication filters. This means, we support Replication of a host (all data which could be replicated), replication of specific databases per host and specific tables per host. (You can ignore databases and tables using filters too).

For these and other questions, along with the full replay of the webinar, register here


Friday Mar 08, 2013

MySQL Web Reference Architectures - Your Guide to Innovating on the Web

MySQL is deployed in 9 of the top 10 most trafficked sites on the web including Facebook, Twitter, eBay and YouTube, as well as in some of the fastest growing services such as Tumblr, Pinterest and box.com

Working with these companies has given MySQL developers, consultants and support engineers unique insight into how to design database-driven web architectures – whether deployed on-premise or in the cloud.

The MySQL Web Reference Architectures are a set of documented and repeatable best practices for building infrastructure that deliver the highest levels of scalability, agility and availability with the lowest levels of cost, risk and complexity. 

Four components common to most web and mobile properties are sized, with optimum deployment architectures for each:

User authentication and session management

Content management

Ecommerce

Analytics and big data integration

The sizing is defined by database size and load, as shown below

For each reference architecture, strategies for scaling the service and ensuring high availability are discussed, along with approaches to secure, audit and backup user data, and tools to monitor and manage the environment.

The Reference Architectures cover the core underlying technologies supporting today’s most successful web services including:

- MySQL Database

- MySQL Cluster

- MySQL Replication

- Caching with Memcached and Redis

- Big Data with Hadoop

- NoSQL APIs

- Geographic Redundancy

- Hardware Recommendations

- Operational Best Practices

An example of the "Large" reference architecture is shown below

To learn more:

- Download the MySQL Web Reference Architectures Guide

- View the MySQL Web Reference Architectures slides

The Reference Architecture are designed as a starting point which we hope will enable you build the next web and mobile phenomenon!

Thursday Feb 07, 2013

MySQL 5.6 Replication Performance

With data volumes and user populations growing, its no wonder that database performance is a hot topic in developer and DBA circles.  

Its also no surprise that continued performance improvements were one of the top design goals of the new MySQL 5.6 release which was declared GA on February 5th (note: GA means “Generally Available”, not “Gypsy Approved” @mysqlborat)

And the performance gains haven’t disappointed:

- Dimitri Kravtchuk’s Sysbench tests showed MySQL delivering up to 4x higher performance than the previous 5.5 release.

- Mikael Ronstrom’s testing showed up to 4x better scalability as thread counts rose to 48 and 60 threads (not so uncommon in commodity systems today)

Of course, YMMV (Your Mileage May Vary) in real-world workloads, but you can be reasonably certain you will see performance gains by upgrading to MySQL 5.6.  It is up to you whether you invest these gains to serve more applications and users from your MySQL databases, or you consolidate to fewer instances to reduce operational costs.

How about if you are using MySQL replication in order to scale out your database across commodity nodes, and as a foundation for High Availability (HA)? Performance here has also been improved through a combination of:

- Binary Log Group Commit

- Multi-Threaded Slaves

- Optimized Row-Based Replication

As well as giving you the benefits above; higher replication performance directly translates to:

- Reduced risk of losing data in the event of a failure on the master

- Improved read consistency from slaves

- Resource-efficient binlogs traversing the replication cluster

We will look at each of these in more detail.

Binlog Group Commit

Rather than applying writes one at a time, Binary Log Group Commit batches writes to the Binlog, significantly reducing overhead to the master. This is demonstrated by the benchmark results below.


Using the SysBench RW tests, enabling the binlog and using the default sync_binlog=0 reduces the throughput of the master by around 10%. With sync_binlog=1 (where the MySQL server synchronizes its binary log to disk after every write to the binlog, thereby giving maximum data safety), throughput is reduced by a further 5%.

To understand the difference this makes, the tests were repeated comparing MySQL 5.6 to MySQL 5.5 


Even with sync_binlog=1, MySQL 5.6 was still up to 4.5x faster than 5.5 with sync_binlog=0

Gone are the days when configuring replication resulted in a 50% or more hit to performance of your master.

The result of Binary Log Group Commit is that MySQL replication is much better able to keep pace with the demands of write-intensive workloads, imposing much less overhead on the master, even when the binlog is flushed to disk after each commit!

Details of the configurations used for the benchmark are in the Appendix at the end of this post.

You can learn more about the implementation of Binlog Group Commit from Mats Kindahl’s blog.

Multi-Threaded Slave

Looking beyond the replication master, it is also necessary to bring performance enhancements to the replication slaves.

Using Multi-Threaded Slaves, processing is split between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Benchmarks demonstrate that Multi-Threaded Slaves increase performance by 5x.  


This performance enhancement translates to improved read consistency for clients accessing the replication cluster. Slaves are better able to keep up with the master, and so users are much less likely to need to throttle the sustained throughput of writes, just so that the slaves don't indefinitely fall further and further behind (previously some users had to reduce the capacity of their systems in order to reduce slave lag).

You can get all of the details on this benchmark and the configurations used in this earlier blog posting

Optimized Row-Based Replication

The final piece in improving replication performance is to introduce efficiencies to the binary log itself.

By only replicating those elements of the row image that have changed following INSERT, UPDATE and DELETE operations, replication throughput for both the master and slave(s) can be increased while binary log disk space, network resource and server memory footprint are all reduced.

This is especially useful when replicating across datacenters or cloud availability zones.

Another performance enhancements is the was Row-Based Replication events are handled on the slave against tables without Primary Keys. Updates would be made via multiple table scans. In MySQL 5.6, no matter size of the event, only one table scan is performed, significantly reducing the apply time.

You can learn more about Optimized Row Based Replication from the MySQL documentation.  

Crash-Safe Slaves and Binlog

Aka “Transactional Replication” this is more of an availability than a performance feature, but there is a nice performance angle to it.

The key concept behind crash safe replication is that replication positions are stored in tables rather than files and can therefore be updated transactionally, together with the data. This enables the slave or master to automatically roll back replication to the last committed event before a crash, and resume replication without administrator intervention. Not only does this reduce operational overhead, it also eliminates the risk of data loss or corruption.

From a performance perspective, it also means there is one less disk sync, since we leverage the storage engine's fsync and don't need an additional fsync for the file, giving users a performance gain.

Wrapping Up

So with a faster MySQL Server, InnoDB storage engine and replication, developers and DBAs can get ahead of performance demands. Bear in mind there is more to MySQL 5.6 replication than performance – for example Global Transaction Identifiers (GTIDs), replication event checksums, time-delayed replication and more.

To learn more about all of the new replication features in MySQL 5.6, download the Replication Introduction guide

To get up and running with MySQL replication, download the new Replication Tutorial Guide 


Appendix – Binary Log Group Commit Benchmarks

System Under Test:

5 x 12 thread Intel Xeon E7540 CPUs @2.00 GHz

512 GB memory

Oracle Linux 6.1


Configuration file:

1) --innodb_purge_threads=1

2) --innodb_file_format=barracuda

3) --innodb-buffer-pool-size=8192M

4) --innodb-support-xa=FALSE

5) --innodb-thread-concurrency=0

6) --innodb-flush-log-at-trx-commit=2

7) --innodb-log-file-size=8000M

8) --innodb-log-buffer-size=256M

9) --innodb-io-capacity=2000

10) --innodb-io-capacity-max=4000

11) --innodb-flush-neighbors=0

12) --skip-innodb-adaptive-hash-index

13) --innodb-read-io-threads=8

14) --innodb-write-io-threads=8

15) --innodb_change_buffering=all

16) --innodb-spin-wait-delay=48

17) --innodb-stats-on-metadata=off

18) --innodb-buffer-pool-instances=12

19) --innodb-monitor-enable='%'

20) --max-tmp-tables=100

21) --performance-schema

22) --performance-schema-instrument='%=on'

23) --query-cache-size=0

24) --query-cache-type=0

25) --max-connections=4000

26) --max-prepared-stmt-count=1048576

27) --sort-buffer-size=32768

28) --table-open-cache=4000

29) --table-definition-cache=4000

30) --table-open-cache-instances=16

31) --tmp-table-size=100M

32) --max-heap-table-size=1000M

33) --key-buffer-size=50M

34) --join-buffer-size=1000000



Tuesday Feb 05, 2013

MySQL 5.6 Replication: New Resources for Database Scaling and HA

MySQL 5.6 reached GA (General Availability) today and is packed with a wealth of new features and capabilities.  Exciting stuff!

MySQL 5.6 also introduces the largest set of enhancements to replication ever delivered in a single release, including:
- 5x higher performance to improve consistency across a cluster and reduce the risks of data loss in the event of a master failing
- Self-healing clusters with automatic failover and recovery from outages or planned maintenance
- Assured data integrity with checksums implemented across the replication workflow
- DevOps automation

Of course, getting started with all of these enhancements can be a challenge - whether you are new to MySQL replication or an experienced user. So two new Guides are available to help take advantage of everything replication offers in MySQL 5.6.

The introduction takes you through new 5.6 features, including Global Transaction Identifiers (GTIDs), BinLog Group Commit, Multi-Threaded Slaves, Crash-safe replication, Checksums, etc.
To provide context to these features, the Introduction Guide takes Developers and DBAs through the concepts of replication, including: 

- Use-cases such as scaling out on commodity hardware, high availability, geo-redundancy for DR and data placement, and supporting complex analytics alongside high velocity OLTP operations in the same cluster

- The replication workflow and internals, discussing the roles of different threads and log files 

- Controlling data consistency using different synchronisation models

- Pros and cons of Statement-Based and Row-Based Replication 

- Monitoring and managing replication using tools such as the replication dashboard within MySQL Enterprise Monitor 


Having learned the concepts of replication and what's new in MySQL 5.6, the Tutorial provides step-by-step guides to configuring, provisioning and managing replication, covering:

- Configuring replication with master and slave .cnf files, creating users, introducing replication to an existing application, initialisation and checking proper operation, using GTIDs and other new features

- Migration to semi-synchronous replication for improved data consistency and reduced risk of data loss

- Administration and configuration by checking slave status, suspending replication and viewing binary logs

- Failover and recovery, including pre-requisites, detecting an outage, slave promotion, redirecting the application and recovering a failed master

These new resources provide everything you need to get started in building your next generation web, cloud, social or mobile application with the world's most popular open source database.

You will find more information in the Docs.  Also, look out for a live webinar in March where we will demonstrate all that is new in MySQL 5.6 replication.



Tuesday Jan 08, 2013

Deep Dive into GTIDs and MySQL 5.6 - What, Why and How

Global Transaction Identifiers (GTIDs) are one of the key replication enhancements in MySQL 5.6. GTIDs make it simple to track and compare replication across a master - slave topology. This enables:

- Much simpler recovery from failures of the master,

- Introduces great flexibility in the provisioning and on-going management of multi-tier or ring (circular) replication topologies.

A new on-demand MySQL 5.6 GTID webinar delivered by the replication engineering team is now available, providing deep insight into the design and implementation of GTIDs, and how they enable users to simplify MySQL scaling and HA. The webinar covers:

- Concepts: What is a GTID? How does the server generate GTIDs? What is the life cycle of GTIDs? How are GTIDs used to connect to a master?

- Handling conflicts

- How to skip transactions using GTIDs

- What happens when binary logs are purged

- How to provision a new slave or restore from a backup

- MySQL utilities for automated failover and controlled switchover

To whet your appetite, an extract of the Q&A from the webinar is as follows. These, and many other questions were answered during the session:

Q. Which versions of MySQL support GTIDs?

A. MySQL 5.6.5 and above

Q. Is GTID ON by default in 5.6?

A. It is OFF by default

Q. What does the GTID contain?

A. It is made up of a unique ID for the server followed by an ever-increasing counter that's specific to that server

Q: Do GTIDs introduce any increased space requirements?

A: Yes, since GTIDs are stored in the binary log, the binary logs will be larger. However, we expect the overhead to be relatively small. GTIDs are written to the binary log in two places:

(1) A small header is stored in the beginning of the binary log. This contains the variable @@gtid_purged, i.e., a list of previously logged GTIDS. Since the list is range-compressed, this is expected to be small: a small fixed-size header plus 40 bytes times the number of master servers in your topology.

(2) A fixed size header is added before each transaction in the binary log. This is 44 bytes, so will typically be small compared to the transaction itself.

Q. I understand GTID's are associated with Transactions. How do they map to the events within each transaction, or do GTID's map as an event itself in a binlog file?

A. Yes, GTIDs are associated with transactions. In the binary log, the GTID is realized as an event written prior to the events that constitute the transaction. The event is called a Gtid_log_event.

Q What if a transaction spans a filtered out table and a non-filtered out table? How does it get recorded on the slave?

A. If the filters are on the master, then a partly logged transaction will be replicated with its GTID.

If filtering on the slave side, a partial image will be processed on the slave and the original GTID is logged (to the slave's binlog) with the processed transaction.

Q. Prior to GTID, to build a new slave, we use mysqldump --master-data=1 to get the slave starting sync point in the dump. With GTID enabled, does it set the gtid_executed / purged in the dump instead?

A. Yes, mysqldump will detect that the server uses GTIDs and output a SET GTID_PURGED statement. (And there is an option to turn off that, e.g., in case you want to execute the output on an old server).

Q. How do GTIDs enable failover and recovery?

A. GTIDs are using in combination with the MySQL utilities. The mysqlfailover and rpladmin utilities provide administration of GTID-enabled slaves, enabling monitoring with automatic failover and on-demand switchover, coupled with slave promotion. GTIDs make it straightforward to reliably failover from the master to the most current slave automatically in the event of a failure. DBAs no longer need to manually analyze the status of each of their slaves to identify the most current when seeking a target to promote to the new master.

Resources to Get Started

In addition to the webinar, here are some other key resources that will give you the detail you need to take advantage of GTIDs in your most important MySQL workloads:

- Engineering blog: Global Transaction Identifiers – why, what, and how

- Engineering blog: Advanced use of GTIDs

- Documentation: Setting up replication with GTIDs 

- Video Tutorial: MySQL replication utilities for auto-failover and switchover 

- Engineering Blog: Controlling read consistency with GTIDs 

If you have any comments, questions or feature requests, don't hesitate to leave a comment on this blog

Monday Oct 08, 2012

New Options for MySQL High Availability

Data is the currency of today’s web, mobile, social, enterprise and cloud applications. Ensuring data is always available is a top priority for any organization – minutes of downtime will result in significant loss of revenue and reputation.

There is not a “one size fits all” approach to delivering High Availability (HA). Unique application attributes, business requirements, operational capabilities and legacy infrastructure can all influence HA technology selection. And then technology is only one element in delivering HA – “People and Processes” are just as critical as the technology itself.

For this reason, MySQL Enterprise Edition is available supporting a range of HA solutions, fully certified and supported by Oracle. MySQL Enterprise HA is not some expensive add-on, but included within the core Enterprise Edition offering, along with the management tools, consulting and 24x7 support needed to deliver true HA.

At the recent MySQL Connect conference, we announced new HA options for MySQL users running on both Linux and Solaris:

- DRBD for MySQL

- Oracle Solaris Clustering for MySQL

DRBD (Distributed Replicated Block Device) is an open source Linux kernel module which leverages synchronous replication to deliver high availability database applications across local storage. DRBD synchronizes database changes by mirroring data from an active node to a standby node and supports automatic failover and recovery. Linux, DRBD, Corosync and Pacemaker, provide an integrated stack of mature and proven open source technologies.


DRBD Stack: Providing Synchronous Replication for the MySQL Database with InnoDB

Download the DRBD for MySQL whitepaper to learn more, including step-by-step instructions to install, configure and provision DRBD with MySQL

Oracle Solaris Cluster provides high availability and load balancing to mission-critical applications and services in physical or virtualized environments. With Oracle Solaris Cluster, organizations have a scalable and flexible solution that is suited equally to small clusters in local datacenters or larger multi-site, multi-cluster deployments that are part of enterprise disaster recovery implementations. The Oracle Solaris Cluster MySQL agent integrates seamlessly with MySQL offering a selection of configuration options in the various Oracle Solaris Cluster topologies.


Putting it All Together

When you add MySQL Replication and MySQL Cluster into the HA mix, along with 3rd party solutions, users have extensive choice (and decisions to make) to deliver HA services built on MySQL

To make the decision process simpler, we have also published a new MySQL HA Solutions Guide.

Exploring beyond just the technology, the guide presents a methodology to select the best HA solution for your new web, cloud and mobile services, while also discussing the importance of people and process in ensuring service continuity.

This is subject recently presented at Oracle Open World, and the slides are available here.

Whatever your uptime requirements, you can be sure MySQL has an HA solution for your needs


Please don't hesitate to let us know of your HA requirements in the comments section of this blog. You can also contact MySQL consulting to learn more about their HA Jumpstart offering which will help you scope out your scaling and HA requirements.

Friday Sep 21, 2012

MySQL Connect 8 Days Away - Replication Sessions

Following on from my post about MySQL Cluster sessions at the forthcoming Connect conference, its now the turn of MySQL Replication - another technology at the heart of scaling and high availability for MySQL.

Unless you've only just returned from a 6-month alien abduction, you will know that MySQL 5.6 includes the largest set of replication enhancements ever packaged into a single new release:

- Global Transaction IDs + HA utilities for self-healing cluster..(yes both automatic failover and manual switchover available!)

- Crash-safe slaves and binlog

- Binlog Group Commit and Multi-Threaded Slaves for high performance

- Replication Event Checksums and Time-Delayed replication

- and many more

There are a number of sessions dedicated to learn more about these important new enhancements, delivered by the same engineers who developed them. Here is a summary

Saturday 29th, 13.00

Replication Tips and Tricks, Mats Kindahl

In this session, the developers of MySQL Replication present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.


Saturday 29th, 17.30

Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication, Lars Thalmann

This session showcases the new replication features, including

• High performance (group commit, multithreaded slave)

• High availability (crash-safe slaves, failover utilities)

• Flexibility and usability (global transaction identifiers, annotated row-based replication [RBR])

• Data integrity (event checksums)


Saturday 29th, 1900

MySQL Replication Birds of a Feather

In this session, the MySQL Replication engineers discuss all the goodies, including global transaction identifiers (GTIDs) with autofailover; multithreaded, crash-safe slaves; checksums; and more. The team discusses the design behind these enhancements and how to get started with them. You will get the opportunity to present your feedback on how these can be further enhanced and can share any additional replication requirements you have to further scale your critical MySQL-based workloads.


Sunday 30th, 10.15

Hands-On Lab, MySQL Replication, Luis Soares and Sven Sandberg

But how do you get started, how does it work, and what are the best practices and tools? During this hands-on lab, you will learn how to get started with replication, how it works, architecture, replication prerequisites, setting up a simple topology, and advanced replication configurations. The session also covers some of the new features in the MySQL 5.6 development milestone releases.


Sunday 30th, 13.15

Hands-On Lab, MySQL Utilities, Chuck Bell

Would you like to learn how to more effectively manage a host of MySQL servers and manage high-availability features such as replication? This hands-on lab addresses these areas and more. Participants will get familiar with all of the MySQL utilities, using each of them with a variety of options to configure and manage MySQL servers.


Sunday 30th, 14.45

Eliminating Downtime with MySQL Replication, Luis Soares

The presentation takes a deep dive into new replication features such as global transaction identifiers and crash-safe slaves. It also showcases a range of Python utilities that, combined with the Release 5.6 feature set, results in a self-healing data infrastructure. By the end of the session, attendees will be familiar with the new high-availability features in the whole MySQL 5.6 release and how to make use of them to protect and grow their business.


Sunday 30th, 17.45

Scaling for the Web and the Cloud with MySQL Replication, Luis Soares

In a Replication topology, high performance directly translates into improving read consistency from slaves and reducing the risk of data loss if a master fails. MySQL 5.6 introduces several new replication features to enhance performance. In this session, you will learn about these new features, how they work, and how you can leverage them in your applications. In addition, you will learn about some other best practices that can be used to improve performance.

So how can you make sure you don't miss out - the good news is that registration is still open ;-)

And just to whet your appetite, listen to the On-Demand webinar that presents an overview of MySQL 5.6 Replication.  

Wednesday May 23, 2012

MySQL 5.6 Replication: FAQ

On Wednesday May 16th, we ran a webinar to provide an overview of all of the new replication features and enhancements that are previewed in the MySQL 5.6 Development Release – including Global Transaction IDs, auto-failover and self-healing, multi-threaded, crash-safe slaves and more.

Collectively, these new capabilities enable MySQL users to scale for next generation web and cloud applications.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into how these new features are implemented. So I thought it would be useful to post those below, for the benefit of those unable to attend the live webinar (note, you can listen to the On-Demand replay which is available now).

Before getting to the Q&A, there are a couple of other resources that maybe useful to those wanting to learn more about Replication in MySQL 5.6

On-Demand webinar

Slides used during the webinar

For more detail on any of the features discussed below, be sure to check out the Developer Zone article: Replication developments in MySQL 5.6

So here is the Q&A from the event 

Multi-Threaded Slaves

The results from recent benchmarking of the Multi-Threaded Slave enhancement were discussed, prompting the following questions

Q. Going from 0 - 10 threads, did you notice any increase in IOwait on CPU?

A. In this case, yes. The actual amount depends on a number of factors: your hardware, query/transaction distribution across databases, server general and InnoDB specific configuration parameters.

Q. Will the multiple threads work on different transactions on the same database, or each thread works on a separate database?

A. Each worker thread works on separate databases (schemas).

Q. If I set the slave-parallel-workers to less than the number of databases, can I configure which databases use which worker threads?

A. There is no such configuration option to assign a certain Worker thread to a database. Configuring slave-parallel-workers to less than the number of databases is a good setup. A Worker can handle multiple databases.

Q. If I create 4 threads and I have 100 databases, can I configure which databases use which threads?

A. There won't be 1 worker to a mapping of exactly 25 databases, but it will be very close to that type of distribution.

Q. Thank You. I ask as we have about 8 databases that have a lot of transactions and about 30 that are used less frequently. It would be nice to have the ability to create 9 workers, 1 for each if the heavy databases and 1 for all the others

A. The current design enables relatively equal distribution of transactions across your databases, but it could be that 9 workers will fit anyway.

Q. Does multi-thread slave still work if there is foreign key across database?

A. MTS preserves slave consistency *within* a database, but not necessarily *between* databases. With MTS enabled and replication ongoing, updates to one database can be executed before updates to another causing them to be temporarily out of sync with each other.

Q. How is auto-increment managed with the multi-thread slave?

A. MTS makes sure Worker threads do not execute concurrently on the same table. Auto-increment is guaranteed to be handled in the same way as the single threaded "standard" slave handles auto-increment

Q. Can you use semi-synchronous replication on one of the slaves when using MTS?

A. Semi-sync is friendly to MTS. MTS is about parallel execution - so they cooperate well.

Optimized Row Based Replication

Q. If you only store the PK column in the Before Image for updates, does this mean you don't care about the slave's data potentially being out-of-sync? Will we be able to control how much data is stored in the binary logs?

A. The rule is that we ship a *PK equivalent* so that the slave is always able to find a row. This means:
  1. if master table has PK, then we ship the PK only
  2. if master table does not have PK, then we ship the entire row

Global Transaction IDs and HA Utilities

Q. Would the failover utility need to sit on a 3rd party host to allow arbitration?

A. The utility would typically run on a client, not on the hosts it is monitoring

Q. Can you explain the upgrade process to move to MySQL 5.6? I am assuming that the slave(s) are upgraded first and that replication would be backwards compatible. And after the slave(s) are upgraded, the master would be next. But then how do you turn on the GTID?

A. Right: the slave(s) are upgraded first. After upgrading Slaves they would be started with --gtid-mode=on (technically a couple of other options are needed as well). And then the same process would be followed for the upgraded Master.

Q. For failover functionality, if I had a setup like this: Server1 replicates to Server2, Server2 replicates to Server3, Server4, and Server5. If Server2 were to fail, can I have it configured so that Server1 can become the new master for Server3/4/5?

A. Yes - you can either configure the failover to the most recent slave based on GTID, or list specific candidates. What will happen is that Slave 1 will temporarily become a slave of 3, 4 and 5 to ensure it replicates any more recent transactions those slaves may have, and then it will become the master

Replication Event Checksums

Q. What is the overhead of replication checksums?

A. It is minimal - we plan to publish benchmarks over the summer to better characterize any overhead

Q. Are you exposing the checksum to the plugin api so we can add our own checksum types?

A. We prepared some of interfaces, i.e. checksum methods are identified by a specific code byte (1-127) values. But it's not really a plugin at this point.

Q. Do checksums verify both replicated data and the data on slave?

A. Yes - checksums are implemented across the entire path - so you can check for issues in replication itself, or in the hardware or network

Q. I think, it is better to turn on checksums at the relaylog to avoid overhead on the master, but if we do that and checksum fails (i.e. not matching the master's data) then what happens – will the slave throw an error

A. I agree, it's better to relax the Master, which verifies the checksum only optionally when the Dump Thread reads from the binlog prior to the replication event being sent out to the Slave. The slave mandatorily checks the checksummed events when they are sent across the network, and optionally when they are read from Relay-log. In either case, an error is thrown.

Q. Are checksums optional? In some cases we don't care for huge data loads

A. Yes, checksums are optional.

Time Delayed Replication

Q. Is Time delayed replication applied at the database level only and not for the entire slave?

A. Applied for the slave as execution is global.

Informational Log Events

Q. When we configure information log event, does it show meaningful query log for any binlog format? (Row based especially)

A. When using row based replication, you get the original query in human readable format... obviously you don’t want to see all the rows modified in a table of size, which can be huge

Q. Will the binlog include user id?

A. User id is replicated in some cases for Query-log-event - namely user id of the invoker when a stored routine is called.

Remote Binlog Backup

Q. What level of access does the remote binlog backup need?

A. Replication slave

Summary

As you can see, our Engineering team was kept busy with questions over the course of the webinar. Be sure to check out the MySQL 5.6 Replication webinar replay and if you have further questions, please don’t hesitate to use the comments below!

Tuesday Apr 10, 2012

New Replication, Optimizer and High Availability features in MySQL 5.6.5!

As the Product Manager for the MySQL database it is always great to announce when the MySQL Engineering team delivers another great product release.  As a field DBA and developer it is even better when that release contains improvements and innovation that I know will help those currently using MySQL for apps that range from modest intranet sites to the most highly trafficked web sites on the web.  That said, it is my pleasure to take my hat off to MySQL Engineering for today's release of the MySQL 5.6.5 Development Milestone Release ("DMR"). The new highlighted features in MySQL 5.6.5 are discussed here:

New Self-Healing Replication Clusters
The 5.6.5 DMR improves MySQL Replication by adding Global Transaction Ids and automated utilities for self-healing Replication clusters.  Prior to 5.6.5 this has been somewhat of a pain point for MySQL users with most developing custom solutions or looking to costly, complex third-party solutions for these capabilities.  With 5.6.5 these shackles are all but removed by a solution that is included with the GPL version of the database and supporting GPL tools. 

You can learn all about the details of the great, problem solving Replication features in MySQL 5.6 in Mat Keep's Developer Zone article

New Replication Administration and Failover Utilities
As mentioned above, the new Replication features, Global Transaction Ids specifically, are now supported by a set of automated GPL utilities that leverage the new GTIDs to provide administration and manual or auto failover to the most up to date slave (that is the default, but user configurable if needed) in the event of a master failure.

The new utilities, along with links to Engineering related blogs, are discussed in detail in the DevZone Article noted above.

Better Query Optimization and Throughput
The MySQL Optimizer team continues to amaze with the latest round of improvements in 5.6.5. Along with much refactoring of the legacy code base, the Optimizer team has improved complex query optimization and throughput by adding these functional improvements:

  • Subquery Optimizations - Subqueries are now included in the Optimizer path for runtime optimization.  Better throughput of nested queries enables application developers to simplify and consolidate multiple queries and result sets into a single unit or work.
  • Optimizer now uses CURRENT_TIMESTAMP as default for DATETIME columns - For simplification, this eliminates the need for application developers to assign this value when a column of this type is blank by default.
  • Optimizations for Range based queries - Optimizer now uses ready statistics vs Index based scans for queries with multiple range values.
  • Optimizations for queries using filesort and ORDER BY.  Optimization criteria/decision on execution method is done now at optimization vs parsing stage.
  • Print EXPLAIN in JSON format for hierarchical readability and Enterprise tool consumption.

You can learn the details about these new features as well all of the Optimizer based improvements in MySQL 5.6 by following the Optimizer team blog.

You can download and try the MySQL 5.6.5 DMR here. (look under "Development Releases")  Please let us know what you think!  The new HA utilities for Replication Administration and Failover are available as part of the MySQL Workbench Community Edition, which you can download here .

Also New in MySQL Labs
As has become our tradition when announcing DMRs we also like to provide "Early Access" development features to the MySQL Community via the MySQL Labs.  Today is no exception as we are also releasing the following to Labs for you to download, try and let us know your thoughts on where we need to improve:

InnoDB Online Operations
MySQL 5.6 now provides Online ADD Index, FK Drop and Online Column RENAME.  These operations are non-blocking and will continue to evolve in future DMRs.  You can learn the grainy details by following John Russell's blog.

InnoDB data access via Memcached API ("NotOnlySQL") - Improved refresh of an earlier feature release
Similar to Cluster 7.2, MySQL 5.6 provides direct NotOnlySQL access to InnoDB data via the familiar Memcached API. This provides the ultimate in flexibility for developers who need fast, simple key/value access and complex query support commingled within their applications.

Improved Transactional Performance, Scale
The InnoDB Engineering team has once again under promised and over delivered in the area of improved performance and scale.  These improvements are also included in the aggregated Spring 2012 labs release:

InnoDB CPU cache performance improvements for modern, multi-core/CPU systems show great promise with internal tests showing:   

  • 2x throughput improvement for read only activity
  • 6x throughput improvement for SELECT range
  • Read/Write benchmarks are in progress

More details on the above are available here.

You can download all of the above in an aggregated "InnoDB 2012 Spring Labs Release" binary from the MySQL Labs. You can also learn more about these improvements and about related fixes to mysys mutex and hash sort by checking out the InnoDB team blog.

MySQL 5.6.5 is another installment in what we believe will be the best release of the MySQL database ever.  It also serves as a shining example of how the MySQL Engineering team at Oracle leads in MySQL innovation.

You can get the overall Oracle message on the MySQL 5.6.5 DMR and Early Access labs features here.

As always, thanks for your continued support of MySQL, the #1 open source database on the planet!


Benchmarking MySQL Replication with Multi-Threaded Slaves

The objective of this benchmark is to measure the performance improvement achieved when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6.

As the results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance based on a configuration with 10 databases/schemas. For real-world deployments, higher replication performance directly translates to:

· Improved consistency of reads from slaves (i.e. reduced risk of reading "stale" data)

· Reduced risk of data loss should the master fail before replicating all events in its binary log (binlog)

The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Multi-Threaded Slaves are just one of many enhancements to replication previewed as part of the MySQL 5.6 Development Release, which include:

· Global Transaction Identifiers coupled with MySQL utilities for automatic failover / switchover and slave promotion

· Crash Safe Slaves and Binlog

· Optimized Row Based Replication

· Replication Event Checksums

· Time Delayed Replication

These and many more are discussed in the “MySQL 5.6 Replication: Enabling the Next Generation of Web & Cloud Services” Developer Zone article 

Back to the benchmark - details are as follows.


Environment
The test environment consisted of two Linux servers:

· one running the replication master

· one running the replication slave.

Only the slave was involved in the actual measurements, and was based on the following configuration:

- Hardware: Oracle Sun Fire X4170 M2 Server

- CPU: 2 sockets, 6 cores with hyper-threading, 2930 MHz.

- OS: 64-bit Oracle Enterprise Linux 6.1
- Memory: 48 GB

Test Procedure
Initial Setup:

Two MySQL servers were started on two different hosts, configured as replication master and slave.

10 sysbench schemas were created, each with a single table:

CREATE TABLE `sbtest` (
   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `k` int(10) unsigned NOT NULL DEFAULT '0',
   `c` char(120) NOT NULL DEFAULT '',
   `pad` char(60) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `k` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

10,000 rows were inserted in each of the 10 tables, for a total of 100,000 rows. When the inserts had replicated to the slave, the slave threads were stopped. The slave data directory was copied to a backup location and the slave threads position in the master binlog noted.

10 sysbench clients, each configured with 10 threads, were spawned at the same time to generate a random schema load against each of the 10 schemas on the master. Each sysbench client executed 10,000 "update key" statements:

UPDATE sbtest set k=k+1 WHERE id = <random row>

In total, this generated 100,000 update statements to later replicate during the test itself.

Test Methodology:
The number of slave workers to test with was configured using:

SET GLOBAL slave_parallel_workers=<workers>

Then the slave IO thread was started and the test waited for all the update queries to be copied over to the relay log on the slave.

The benchmark clock was started and then the slave SQL thread was started. The test waited for the slave SQL thread to finish executing the 100k update queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark clock was stopped and the duration calculated.

The calculated duration from the benchmark clock should be close to the time it took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave the benchmark metric, reported as Queries Per Second (QPS).

Test Reset:

The test-reset cycle was implemented as follows:

· the slave was stopped

· the slave data directory replaced with the previous backup

· the slave restarted with the slave threads replication pointer repositioned to the point before the update queries in the binlog.

The test could then be repeated with identical set of queries but a different number of slave worker threads, enabling a fair comparison.

The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the QPS metric calculated and averaged for each worker count.

MySQL Configuration
The relevant configuration settings used for MySQL are as follows:

binlog-format=STATEMENT
relay-log-info-repository=TABLE
master-info-repository=TABLE

As described in the test procedure, the
slave_parallel_workers setting was modified as part of the test logic. The consequence of changing this setting is:

0 worker threads:
   - current (i.e. single threaded) sequential mode
   - 1 x IO thread and 1 x SQL thread
   - SQL thread both reads and executes the events

1 worker thread:
   - sequential mode
   - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread
   - coordinator reads the event and hands it to the worker who executes

2+ worker threads:
   - parallel execution
   - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads
   - coordinator reads events and hands them to the workers who execute them

Results
Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication performance when configured with 10 worker threads, with the load evenly distributed across our 10 x schemas. This result is compared to the current replication implementation which is based on a single SQL thread only (i.e. zero worker threads).

Figure 1: 5x Higher Performance with Multi-Threaded Slaves

The following figure shows more detailed results, with QPS sampled and reported as the worker threads are incremented.

The raw numbers behind this graph are reported in the Appendix section of this post.



Figure 2: Detailed Results

As the results above show, the configuration does not scale noticably from 5 to 9 worker threads. When configured with 10 worker threads however, scalability increases significantly. The conclusion therefore is that it is desirable to configure the same number of worker threads as schemas.

Other conclusions from the results:

· Running with 1 worker compared to zero workers just introduces overhead without the benefit of parallel execution.

· As expected, having more workers than schemas adds no visible benefit.

Aside from what is shown in the results above, testing also demonstrated that the following settings had a very positive effect on slave performance:


relay-log-info-repository=TABLE
master-info-repository=TABLE

For 5+ workers, it was up to 2.3 times as fast to run with TABLE compared to FILE.

Conclusion

As the results demonstrate, Multi-Threaded Slaves deliver significant performance increases to MySQL replication when handling multiple schemas.

This, and the other replication enhancements introduced in MySQL 5.6 are fully available for you to download and evaluate now from the MySQL Developer site (select Development Release tab).

You can learn more about MySQL 5.6 from the documentation 

Please don’t hesitate to comment on this or other replication blogs with feedback and questions.

Appendix – Detailed Results

Tuesday Dec 20, 2011

MySQL 5.6.4 Development Milestone Now Available!

I am pleased to announce that the MySQL Database 5.6.4 development milestone release ("DMR") is now available for download (select the Development Release tab). MySQL 5.6.4 includes all 5.5 production-ready features and provides an aggreation of all of the new features that have been released in earlier 5.6 DMRs.  5.6.4 adds many bug fixes and more new "early and often" enhancements that are development and system QA complete and ready for Community evaluation and feedback.  You can get the complete rundown of all the new 5.6.4 specific features here.

For those following the progression of the 5.6 DMRs as the trains leave the station, you should bookmark these MySQL Engineering development team specific blogs:

You can also track the thought and innovation leaders on the MySQL Optimizer and the new Optimizer specific improvements in 5.6.4 by following the MySQL Optimizer Team member blogs:

And of course you can follow others on the Optimizer team and all of MySQL Engineering teams by bookmarking/subscribing to PlanetMySQL.

We look forward to your feedback on MySQL 5.6.4, so please download your copy now and help us make a better MySQL. 

As always, a sincere thanks for your continued support of MySQL!   


Monday Oct 03, 2011

More Early Access Features in the MySQL 5.6.3 Development Milestone!

For those with an interest in MySQL, this week at Oracle OpenWorld has gotten off to a great start.  Demonstrating how Oracle drives MySQL innovation Tomas' "State of the Dolphin" keynote on Monday gave a great overview of the new MySQL products that have recently been delivered:

Of these announcements I am most excited about the new early access features ready for community evaluation in the MySQL 5.6.3 DMR.  Since 5.5 was released last December we have gotten community and customer feedback that it is the best, highest quality release to date.  5.6 builds on 5.5 by improving:

The MySQL Optimizer for better query performance, scale, diagnostics
Oracle owns the MySQL optimizer and is making a huge investment in engineering and re-factoring to ensure the optimizer evolves to meet the most demanding and complex query requirements.  To this end, the 5.6.3 optimizer now includes:

File sort optimizations for queries with SELECT *, ORDER by non-indexed column, with small LIMIT values 

Currently such queries that ORDER by a non-indexed column perform a full table scan, create sort operations for entire table, potentially create temp tables, toss out the unneeded rows and merge the result set.  With the new default behavior the whole table is scanned, but only the needed rows are sorted and returned in the result set.  Early internal tests show 3x improvement in execution times when using the default sort buffer (of course your mileage may vary.)

Index Condition Pushdown ("ICP")

By default, the optimizer now pushes indexed columns in WHERE clause down to the storage engine for evaluation.  For example, for this table and related query on secondary key of postalcode+age:

CREATE TABLE person (
      personid INTEGER PRIMARY KEY,
      firstname CHAR(20),
      lastname CHAR(20),
      postalcode INTEGER,
      age INTEGER,
      address CHAR(50),
      KEY k1 (postalcode,age)
   ) ENGINE=InnoDB;

SELECT lastname, firstname FROM person
   WHERE postalcode BETWEEN 5000 AND 5500 AND age BETWEEN 21 AND 25; 


In this case the optimizer will use a range scan on postalcode from 5000-5500 (on index) and push the evaluation of age to InnoDB which will evaluate the value of age "BETWEEN 21 and 25".  For each match, InnoDB will return a row back to the server, which will continue requesting/receiving rows until all rows have been evaluated and returned.  Early non-scientific testing shows:

With 5 million rows in table, random values, running on commodity, desktop-grade hardware
RESULTS:
ICP disabled: Disk bound (default buffer pool 128 Mb) the query will take 15 seconds
ICP disabled: All data in memory (buffer pool 1.5 Gb) the query will take 1.4 seconds
ICP enabled: execution time reduced to 90 ms for both

Batched Key Access ("BKA") and Multi-Range Read ("MRR")

Multi-Range Read was introduced in the 5.6.2 DMR. MRR improves the performance of queries that use secondary indexes, by scanning one or more index ranges used in a query, sorting the associated disk blocks for the row data, then reading those disk blocks using larger sequential I/O requests. The speedup benefits operations such as
range index scans and equi-joins on indexed columns.

BKA builds on MRR by improving on the performance of disk-bound join queries.  With BKA the server fills up the join buffer with "join keys".  The server then sends batches of join keys to the storage engine layer (InnoDB or MyISAM, and soon NDB) for processing.  The storage engine then uses MRR to sort indexes according to the data sequence so the data pages can be accessed in sequence for efficiency.

Early testing of DBT3 Q3: Customer Distribution Query on InnoDB, buffer pool = 50 MB, 4GB InnoDB data file, and disk/memory ratio (~2% in memory) has shown execution time drops from 2000 sec to 10 sec with a sufficiently large join buffer.

EXPLAIN for INSERT/UPDATE/DELETE

EXPLAIN for SELECT has been available since MySQL 5.0.  This new feature is a long standing request from our community and customer base.  Best to get straight to an example of how it is used:

CREATE TABLE t1(c1 INT, c2 INT, KEY(c2), KEY(c2, c1));

EXPLAIN UPDATE t1 SET c1 = 10 WHERE c2 = 1;

mysql> EXPLAIN UPDATE t1 SET c1 = 10 WHERE c2 = 1;

+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows |  | Extra    |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | range | c2,c2_2       | c2   | 5       | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+

1 row in set (0.00 sec)

In this example the optimizer will use a type=range (scan) on key=c2 to identify and UPDATE the affected row(s).

Optimizer Traces
This feature allows DBAs, developers to trace the decision-making steps of the optimizer.  The TRACE output shows how the EXPLAIN plan was generated and the decision points resolved along the way.  An example would be:

SET SESSION.OPTIMIZER_TRACE=enabled=on";

SELECT (SELECT 1 FROM t6 WHERE d = c) AS RESULT FROM t5 ;

select * from information_schema.OPTIMIZER_TRACE;

With output snippet:
...

"records_estimation": [
  {
    "database": "test",
    "table": "t6",
    "range_analysis": {
      "table_scan": {
         "records": 2,
         "cost": 4.5034
      },
      "potential_range_indices": [
        {
          "index": "d",
          "usable": true,
          "key_parts": [
            "d"
          ]
        }
      ],
      "best_covering_index_scan": {
        "index": "d",
        "cost": 1.4233,
        "chosen": true
       },


...

Optimizer tracing works for SELECT, EXPLAIN SELECT, INSERT or REPLACE ( with VALUES or SELECT), UPDATE/DELETE, SET, DO, DECLARE/CASE/IF/RETURN.

Other Optimizer related improvements in 5.6.3 include:

Late materialization of views/subqueries in FROM clause
This improvement enables fast EXPLAINs for views/subqueries and defers/avoids materialization when possible, enabling for quicker bail out if query is abandoned or canceled. Internal tests show a 240x improvement in execution time (drops from 8 min to 2 sec)

Persistent Optimizer Statistics for InnoDB
This allows for more stable, accurate InnoDB statistics and minimizes the overhead of recalculating statistics on start up or crash recovery.  Statistics can be manually updated when needed.

The 5.6.3 DMR also includes new features and improvements to the following:

Performance Schema now includes instrumentation for:

  • Statements/execution stages - What are my most resource intensive queries?
  • Table and Index I/O - Which app tables/indexes cause the most load?
  • Table Locks - Which tables cause the most contention?
  • Users/Hosts/Accounts level resource consumption - Who are the most resource intensive users, hosts, accounts?
  • Network I/O - Is it the network or my app?  How long do sessions idle?
  • Aggregated summaries by thread, user, host, account, object

InnoDB

  • New INFORMATION_SCHEMA tables (metrics, system and buffer pool information) for better diagnostics
  • Dump and restore buffer pool for quicker startup and recovery
  • Reduce contention during file extension for better concurrency

Replication

  • Crash-safe slaves and binlog, replication checksums for better data integrity
  • Multi-threaded slaves for better performance
  • time-delayed replication for better testing, data integrity
  • Informational log events for easier troubleshooting

And more...

You can learn more about the 5.6.3 DMR and other early access features by visiting the MySQL Developers Zone.  Here you will find technical articles, the MySQL Database 5.6.3 and MySQL Cluster 7.2 DMR downloads, supporting documentation and all related PlanetMySQL blogs.

If you are attending Oracle Open World this week please plan to attend as many of the 47 MySQL sessions as you can, including my sessions on the "MySQL Roadmap" (Tuesday at 5:00 pm) and "Using MySQL with Other Oracle Products" (Wednesday at 11:45 am PT).  Both are in the Marriott Marquis (just up the block from the Moscone Center), Golden Gate C2.  Also, be sure to join us at the MySQL Community Reception .  There will be many familiar faces and friends there, and it is open to all, even if you are not attending OOW.

As always, thanks for your continued support of MySQL!

 





About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today