By Trent Lloyd on Feb 10, 2009
I ran into a problem today where a web application and a suite of servers relying on a MySQL server were not responding, or responding very slowly.
Upon investigation of the issue, we found a high number of "unauthenticated connections" in the MySQL process list. A common cause of this is that MySQL cannot lookup the reverse DNS of the servers connecting, but it was strange this "suddenly" started happening. The servers in question used private IP addresses in the range 10.10.10.x.
The problem with this is if you do not setup a local zone for "10.10.10.x" reverse DNS, and poison your local caches to force those zones to be looked up locally, it heads out to the global internet to try and find a DNS server responsible for the 10.10.10.x reverse DNS (this translates into DNS name x.10.10.10.in-addr.arpa).
Because the root servers spent a lot of time fielding requests for these zones specifically with NULL replies, it was taking a lot of resources and putting pressure on the root name servers. Thus, along came the AS112 project (http://www.as112.net/).
This project was setup to make a distributed network of servers that will answer requests for these zones (specifically, the reverse DNS for all RFC1918 private addresses). The root servers "delegate" these zones off to the project and thus DNS caches around the world will remember that it was delegated and stop interrogating the root servers for all of these lookups.
AS112 uses a routing method called "Anycast" where multiple servers will advertise to the internet that they own a single IP address, for example 18.104.22.168. And then through the natural routing processes of the internet, it will find the closest server to respond to it's request. In this way the load of these queries can be spread among multiple servers on the internet, it will also find a more local server, with less latency, which means the DNS responses are faster.
What happened for this particular site, is the local AS112 server had gone off-line, and there are no others to field the queries. Thus rather than the convenience of the AS112 sending back negative queries, the queries were now timing out trying to reach an anycast server (22.214.171.124) which was not available.
MySQL's DNS resolver can take some time to timeout and thus held up all of the connections and made the application "very" slow .. so much that actually all the servers stopped responding to any application requests in any useful amount of time.
So now we know why this problem occurred, how do we solve it?
This problem is not MySQL specific, but very often MySQL is one of the few places it is really noticed as a connection will not be accepted until the DNS is resolved which may take a long time, and many applications dependent on MySQL are latency-sensitive. Users do not wish to sit for 30 seconds while their web-pages load, for example.
So, to fix this problem at a global level, you need to reconfigure your local CACHE dns server to respond to these queries.
In the case of most BIND DNS servers, you can simply setup a blank zone for 10.in-addr.arpa. Alternatively you can actually populate the zone with the correct information, but a blank zone will at least prevent the delays above.
For DJBDNS' dnscache, you need to add a file /service/dnscache/root/servers/10.in-addr.arpa and in that file you can specify the IP address of a DNS server to speak to for this zone. You will then need to setup either a DJB tinydns or some other DNS server to respond to these requests on the IP(s) you entered into that file.
Note that depending on your setup you may also need to setup 168.192.in-addr.arpa zones or 16.172.in-addr.arpa or whatever other private IP range you make use of.
To solve this problem specifically in MySQL, you can configure MySQL with the 'skip_name_resolve' option, which should be placed in the [mysqld] section of my.cnf. This will prevent the server from attempting DNS lookups on connect.
There are two side effects to this option, the first is your "PROCESSLIST" output will not include hostnames of connected users, and only IP addresses. The second is if you have any MySQL GRANTs based on a hostname, these will no longer work as the hostname is not looked up. All grants must use IP addresses, 'localhost' or a simple '%' wildcard.
In this case we decided to go for both approaches.
In summary, always make sure that all of your servers have an authoritative DNS server for both your local reverse DNS for LAN addresses, and also the search domain you have in your resolv.conf (for similar reasons).