Thursday Jul 04, 2013

Connection Failures Between Nodes in MySQL Cluster

If you're setting up MySQL Cluster on Oracle Linux or another Linux such as Red Hat, CentOS or Fedora, you might have come across some problems getting the nodes to communicate. There are two ways you might bump into problems with network connectivity: The iptables firewall, and SELinux. These security mechanisms might prevent your various nodes—management, data, and API—from communicating with each other in various ways and with various symptoms.

Let's have a look at what you're likely to see. 

Data nodes stuck at "starting"

The first thing you might notice is that your data nodes get stuck in the "starting" state. Running show in the management client gives something like this:

[ndbd(NDB)]     2 node(s)
id=2    @  (mysql-5.6.11 ndb-7.3.2, Nodegroup: 0, Master)
id=3    @  (mysql-5.6.11 ndb-7.3.2, starting, Nodegroup: 0)

...and it never moves away from starting.

Looking in the management node's log (possibly named something like ndb_1_cluster.log), you might see something like this, repeated many times:

[MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 2 to connect,  
      nodes [ all: 2 and 3 connected: 3 no-wait:  ]

This sort of thing appears too:

[MgmtSrvr] INFO     -- Node 2: Waiting 60 sec for non partitioned start, 
     nodes [ all: 2 and 3 connected: 2 missing: 3 no-wait:  ]

Just to make matters worse, the first data node might start, but the second gets stuck at this point:

[ndbd] INFO     -- Start phase 0 completed 

What might confuse you is that it's obvious that both data nodes have connected to the management node, but they're not talking to each other. The reason for this is grounded in how the various nodes communicate with data nodes.

Data Node Communication 

Data nodes communicate with the management node on port 1186. This port is registered with IANA, and is often open on system that have MySQL installed. Once a data node launches and has been given a node ID, it gets a unique dynamically allocated port—an "ephemeral port"—on its host, and opens a socket so other hosts can communicate with it. This port is assigned by the operating system, and as a result cannot be predicted by the data node; the IANA suggest a port range of 49152–65535, but in Linux the range is from 32768 to 61000. You can get this range with the following command:

$ cat /proc/sys/net/ipv4/ip_local_port_range
32768	61000

This is where the problem lies: Firewalls routinely block incoming traffic on those high ports, so while each data node can open a connection to the management node, they can't open connections to other data nodes if those nodes are listening on sockets that are blocked by the firewall. MySQL Cluster won't declare the cluster started until all data nodes have connected (unless you use --nowait-nodes, and in general you shouldn't), so they get stuck in "starting" until they can talk to other data nodes.

iptables and ServerPort

 Many Linux servers use iptables as a firewall. To open the range of ephemeral ports in the Linux iptables firewall, use something like the following:

iptables -I INPUT -p tcp --dport 32768:61000 -s -j ACCEPT 

This isn't ideal, because you're opening thousands of ports, so MySQL Cluster provides a way that you can be more targetted in your setup. 

You can assign fixed ports to data nodes by using the ServerPort option in your management node's config.ini file, like this: 


Make sure to use ports that are unique to that host, and are unlikely to be used by other services. 

Once you've done that, open those ports in the data node hosts' firewalls. To match the example shown above, open port 50501 on datahost1 and port 50502 on datahost2. For example, the following command, when run as root on datahost1, opens its socket:

iptables -I INPUT -p tcp --dport 50501 -s -j ACCEPT

Once you've verified that this works, save your firewall rules with the following command (run as root) on each data node:

service iptables save 

Stuck Management Nodes

Of course, it's also possible that your management node can't listen on port 1186; this is particularly possible on systems that you've installed by copying the binary from compressed archive, tar.gz or similar. If so, you might need to open that port too by using a similar technique.

For example, if you've configured two management servers, and port 1186 is not open on one of them, you'll see something like this when running show in the management client that cannot see the other:

ERROR Message: The cluster configuration is not yet confirmed 
   by all defined management servers. This management server 
   is still waiting for node 2 to connect.
Could not get configuration
*  4012: Failed to get configuration
*    The cluster configuration is not yet confirmed by all 
     defined management servers. This management server is 
     still waiting for node 2 to connect.

If one management node has port 1186 open and the other closed, the one that can communicate with the other displays this:

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from mgmhost1)
id=2    @  (mysql-5.6.11 ndb-7.3.2)

iptables and PortNumber

In this situation, the one that appears to be working is actually the one with the blocked port; it can see the other, because the other's port is available. Again, a quick blast of iptables will fix the problem:

iptables -I INPUT -p tcp --dport 1186 -s -j ACCEPT 

You can, of course, change the management node's port. The relevant option is PortNumber, although bear in mind that 1186 is a registered port, so changing it requires changing not only your iptables Tables configuration, but also SELinux and any application connect strings you use.

MySQL Server Node failing to connect

When you install MySQL from RPM—either MySQL Server or MySQL Cluster—the installer adds SELinux configuration to the machine. The default configuration allows MySQL's server process mysqld to talk to ports 3306 and 1186. These are the ports of the mysqld process and the MySQL Cluster management node respectively.

However, the default SELinux configuration doesn't know anything about MySQL acting as an API node in a MySQL Cluster, so doesn't configure any permissions that allow the mysqld process to access the data nodes' dynamic ports.

This typically manifests itself as the data node appearing to connect, but running show displays something like this:

[mysqld(API)]   1 node(s)
id=5 (not connected, accepting connect from any host) 

The MySQL Error log shows this:

[Warning] NDB : Tables not available after 30 seconds.  
   Consider increasing --ndb-wait-setup value 

 The cluster log shows this:

[MgmtSrvr] INFO     -- Nodeid 5 allocated for API at
[MgmtSrvr] INFO     -- Node 5: mysqld --server-id=1
[MgmtSrvr] INFO     -- Node 3: Node 5 Connected
[MgmtSrvr] INFO     -- Node 4: Node 5 Connected
[MgmtSrvr] INFO     -- Node 4: Node 5: API mysql-5.6.11 ndb-7.3.2
[MgmtSrvr] INFO     -- Node 3: Node 5: API mysql-5.6.11 ndb-7.3.2
[MgmtSrvr] ALERT    -- Node 4: Node 5 Disconnected
[MgmtSrvr] INFO     -- Node 4: Communication to Node 5 closed
[MgmtSrvr] ALERT    -- Node 3: Node 5 Disconnected
...and several more such lines.

Attempts to create a clustered table result in a vague error:

ERROR 157 (HY000): Could not connect to storage engine

 Running show warnings in the MySQL client gives a more bizarre result:

mysql> show warnings;
| Level   | Code | Message                                                                         |
| Warning | 1296 | Got error 4009 'Cluster Failure' from NDB. Could not acquire global schema lock |
| Warning | 1296 | Got error 4009 'Cluster Failure' from NDB                                       |
| Error   |  157 | Could not connect to storage engine                                             |
| Error   | 1499 | Too many partitions (including subpartitions) were defined                      |
4 rows in set (0.00 sec)

The MySQL Error log adds lines like this:

[Warning] NDB: Could not acquire global schema lock (4009)Cluster Failure

SELinux and MySQL Cluster SQL Nodes 

This all looks pretty disastrous, but is easily fixed: You can fix it by tweaking SELinux a bit. I've written about SELinux and MySQL in the past, and the same advice in that article applies here. You could, for example, disable SELinux entirely. But, being a more intelligent reader, you're more likely to want to know how to configure it. 

Once you've set ServerPort as shown above,  you can easily let SELinux in on that particular secret and permit the mysqld process to access the ports you've assigned by running the following command as root on the MySQL host:

 semanage port -a -t mysqld_port_t -p tcp 50501-50524

This opens 24 ports starting with the two in the configuration example shown earlier. Having done this, your MySQL Server instance can talk to data nodes on their fixed ports.

Other Connections

A lot of what's said in this post also applies to other connections. If you've got a Java application that uses ClusterJPA to talk to your cluster, you'd better make sure it can see the management nodes and that SELinux isn't preventing it from accessing the data nodes.

Similarly, if you've changed your management node's port from the default, ensure you've configured the connection string correctly, with that port, in all of your applications' connect strings. This includes MySQL Server applications where you'll need to include the revised port number in the ndb-connectstring option. 

You don't need to tell your applications what hosts or ports your data nodes are at. That's the joy of a centralized configuration, isn't it?

Thursday Apr 11, 2013

Converting InnoDB Tables to MySQL Cluster

If you're considering using MySQL Cluster, you might be tempted to try it out on one of your existing databases. MySQL Cluster 7.3 (available as a development release at the time of this post) supports foreign keys, so it seems sensible to try migrating an InnoDB database with foreign keys to MySQL Cluster.

What won't work

For this example, I'm using the world_innodb.sql script available from the MySQL sample databases. It's a simple database with three tables: Country, City, and CountryLanguage. Both City and CountryLanguage have foreign keys referencing the Country table's Code field.

So, you might try something like this:


Sadly, this won't work; InnoDB won't let you convert a table from InnoDB to another engine if the table is at either end of a foreign key, to avoid dangling constraints. If you try to convert an InnoDB table to NDB using one of the above statements, this happens:

ERROR 1217 (23000): Cannot delete or update a parent row: a foreign key constraint fails

Now, the clever among you might be aware of the  foreign_key_checks variable, which disables InnoDB's foreign key constraint checking when you turn it off. It's useful when you're importing data from a script and don't want to check constraints until the import has finished. However, you can't use it to switch off contraints when changing a table's storage engine; it's designed to be used temporarily, for data, so it won't allow dangling constraints. Similarly, the ndb_deferred_constraints variable can't be used for this purpose either, because it doesn't affect InnoDB's constraint protection.

So how do I do it?

There are two ways to migrate InnoDB tables with foreign keys to NDB.

  1. Dump the database and edit the script so each table specifies ENGINE=NDB before re-importing the script to a new database
  2. Drop the constraints, alter the tables to use the NDB engine, and recreate the constraints

Dumping the database and editing the script is a straightforward use of mysqldump and a text editor.

Dropping and recreating the constraints is more interesting, so I'll walk through that.

First, I use SHOW CREATE TABLE to see the constraints' definitions, and particularly their names:

*************************** 1. row ***************************
       Table: City
Create Table: CREATE TABLE `City` (
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) 
     REFERENCES `Country` (`Code`)
1 row in set (0.02 sec)

Running the same command for the CountryLanguage table gives me the following constraint (the Country table does not have any foreign keys defined): 

CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) 
    REFERENCES `Country` (`Code`) 

Now I know what they're called, I can drop them:

mysql> ALTER TABLE City 
    -> DROP FOREIGN KEY city_ibfk_1;
Query OK, 4079 rows affected (0.95 sec)
Records: 4079  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE CountryLanguage 
    -> DROP FOREIGN KEY countryLanguage_ibfk_1;
Query OK, 984 rows affected (0.58 sec)
Records: 984  Duplicates: 0  Warnings: 0

Then I can run the ALTER TABLE statements shown at the top of this post:

Query OK, 239 rows affected (4.15 sec)
Records: 239  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE City ENGINE=NDB; Query OK, 4079 rows affected (4.02 sec) Records: 4079  Duplicates: 0  Warnings: 0   mysql> ALTER TABLE CountryLanguage ENGINE=NDB; Query OK, 984 rows affected (4.32 sec) Records: 984  Duplicates: 0  Warnings: 0

Finally, I can recreate the constraints based on their definitions from the output of the SHOW CREATE TABLE statements above:

mysql> ALTER TABLE City 
    -> ADD CONSTRAINT `city_ibfk_1` 
    -> FOREIGN KEY (`CountryCode`) 
    -> REFERENCES `Country` (`Code`);
Query OK, 0 rows affected (1.46 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE CountryLanguage 
    -> ADD CONSTRAINT `countryLanguage_ibfk_1` 
    -> FOREIGN KEY (`CountryCode`) 
    -> REFERENCES `Country` (`Code`);
Query OK, 0 rows affected (1.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now that we have our constraints back, I can see if they work:

mysql> DELETE FROM Country WHERE Code='IRL';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (Unknown error code)

All is as expected, and now my InnoDB data is working in my cluster. 


Jeremy Smyth writes MySQL training courses, and likes exploring interesting questions that come up from novices and experts alike.



« July 2016