Note: this article was originally published on http://blogs.innodb.com on April 17, 2009 by Vasil Dimov.
Recently, it was reported (see MySQL bug #43660)
that “SHOW INDEXES/ANALYZE does NOT update cardinality for indexes of
InnoDB table”. The problem appeared to happen only on 64-bit systems,
but not 32-bit systems. The bug turns out to be a case of mistaken
identity. The real criminal here wasn’t the SHOW INDEXES or the ANALYZE
command, but something else entirely. It wasn’t specific to 64-bit
platforms, either. Read on for the interesting story about this mystery
and its solution …
InnoDB estimates statistics for the query optimizer by picking random
pages from an index. Upon detailed analysis, we found that the
algorithm that picks random pages for estimation always picked the same
page, thus producing the same result every time. This made it appear
that the index cardinality was not updated by ANALYZE TABLE. Going
deeper, the reason the algorithm always selected the same page was that
the random number generator always generated numbers that, when divided
by 3, always gave the same remainder (2).
The sampling algorithm selects a random leaf page by starting from
the root page and then selecting a random record from it, descending
into its child page and so on until it reaches a leaf page. In the
particular case that was reported in the bug report, the root page
contained only 3 records and the tree height was only 2 (i.e., the leaf
pages were all just below the root page).
You can already guess what happened. The “random” numbers generated,
not being so random, caused the algorithm to always pick the same
record from the root page (the second one) and then descend to the leaf
page below it. Every time. So, the 8 random pages that were sampled in
order to get an estimate of the whole picture were in fact the same
page, even in isolated ANALYZE TABLE runs.
So, clearly there was a problem with the random number generator.
But why didn’t this problem seem to appear on 64-bit platforms? It
would have, had we only enough time to wait. The random number
generator, always generating numbers like 3k+2 of type unsigned long, at
some point wrapped around 4 billion on 32-bit machines and started
generating numbers like 3k+1. On 64-bit machines, where unsigned long is
much bigger, this wrap did not occur. But it would have occurred if
we ran the test for 1000 years!.
So, on 32-bit machines, at some point the first record from the root
page was picked instead of the second one, and this caused some changes
in the results produced by ANALYZE TABLE. Yet, on 64-bit machines, for
all practical purposes, this “never” happens. By only looking at the
symptoms, one would get the impression that the flaw existed only on
64-bit machines and that 32-bit systems were ok.
Well, what about the fix? A possible fix would be to change InnoDB
in 64-bit environments to behave the same way it does in 32-bit
environments. People who are used to the behavior of InnoDB on 32-bit
machines and upgrade to a 64-bit machine might be satisfied, because the
problem on 64-bit systems was “solved”. But in reality, this approach
in no way would fix the underlying problem. The real solution is to
replace the random number generator with a better one (fully realizing
that algorithmic random number generators are only pseudo-random number generators).
Yet even that is not so simple. Making any change would have caused
changes to index cardinality estimations, thereby causing changes in
decisions made by the optimizer, resulting in different execution plans …
and different, possibly worse, performance for queries. Because MySQL
5.1 and 5.0 are frozen for such drastic changes, we fixed this bug in
the upcoming 1.0.4 release of the InnoDB Plugin.
In order to not break existing applications, and since many people
wanted a fix for MySQL 5.0 and 5.1, we implemented this fix for MySQL
under the control of a new configuration parameter
(innodb_use_legacy_cardinality_algorithm), which is turned on by
default, preserving past behavior. Because the “right fix” is to
permanently change the random number generator, this new configuration
parameter is not present in the InnoDB Plugin, and the “more random”
random number generator will always be used.
And that is the end of the case of mistaken identity. It turns out
that it is really hard to generate truly random numbers, hence the title
of this blog post.