MySQL and HeatWave

Analyzing MySQL Servers in the Context of a Group

Andy Bang
Senior Software Developer
MySQL Enterprise Monitor (MEM) 3.0 is a huge improvement over MEM 2.x and I really hope you'll take a look at it.  My goal here is to tell you about a new feature that is near-and-dear to my heart--looking at a server in the context of a group of servers.

Why is this important?

As many of you know it's important that each server in a replication topology have a unique server_id (each slave, really).  It's not hard to give each server a server_id, but it's also very easy to forget this step, especially if you're cloning a slave from another slave, or if you're under the gun to do something quickly.  It's surprising how often our Support engineers and consultants tell us they see this in the field.

The problem is that if more than one server has the same server_id as another, problems can occur that may be difficult to detect.  Replication may appear to be working correctly, although at a slower rate than normal, as slaves with the same ID fight each other for connections to the master.  Or replication may not be working correctly at all, if a master doesn't send some binary log events to a slave because it thinks the slave already processed those events, when in fact they were processed by another slave with the same server_id.

In MEM 2.x we could look at each server individually and tell you, for example, that one has a server_id = 1, which is suspicious, but not necessarily wrong.  Now in MEM 3.0, the new Replication Configuration Advisor can look at each server in a group and tell you which ones (if any) have the same server_id.  It's a simple thing, but it can save you a lot of time and pain.

Related to that, the new Advisors in MEM 3.0 aren't restricted to a single simple expression that can look for one specific problem--they can now look for multiple problems.  For example, in addition to duplicate server_id's, the Replication Configuration Advisor will also look for slaves running an older version of the MySQL Server than their master, or slaves with max_allowed_packet sizes less than their master.  Both of those conditions can lead to "interesting" replication problems and errors as well.

This capability has allowed us to consolidate many of the individual rules you see in MEM 2.x into more comprehensive advisors in MEM 3.0, thus reducing the administrative burden of scheduling or un-scheduling many different rules.

However, we've tried to strike a balance here.  For example, you'll still find individual rules for "Slave Not Configured As Read Only" and "Slave Without REPLICATION SLAVE Accounts", as well as some new ones.

Why?  Because while the duplicate server_id, slave version, and max_allowed_packet problems are errors and can break replication, many of the other problems are really guidelines, not requirements, and can be followed or not based on your own company's practices and procedures.

Finally, analyzing a server in the context of a group is good for things other than replication.  For example, the new CPU Utilization Advisor now also looks for outliers in a group.  In a well-running system of related servers, the load across the servers will be reasonably well-balanced, with no one system being significantly more loaded than the others.  When one or two servers have much higher CPU utilization than those in the rest of the group, it often times indicates there is a problem that should be investigated.  MEM 3.0 can alert you when that occurs.

I hope you'll take a look at MySQL Enterprise Monitor 3!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.