Wednesday Sep 28, 2011

InnoDB at Oracle OpenWorld

Note: this article was originally published on on Sept 28, 2011 by Calvin Sun.

Sunny and I will be presenting at the Oracle OpenWorld next week:

  • Introduction to InnoDB, MySQL’s Default Storage Engine,  10/04/11 Tuesday 01:15 PM,   Marriott Marquis – Golden Gate C3,     Calvin Sun
  • InnoDB Performance Tuning,  10/04/11 Tuesday  03:30 PM,   Marriott Marquis – Golden Gate C2,   Sunny Bains

The first session is for beginners, who are new to InnoDB and MySQL. The second session will cover many new performance features in MySQL 5.5 and 5.6, and share some tuning tips to maximize MySQL performance.

What to learn more about MySQL? There will be something for everyone. Come to join us!

Wednesday Jul 27, 2011

InnoDB 2011 Summer Labs Releases

Note: this article was originally published on on July 27, 2011 by Calvin Sun.

In April of 2011, InnoDB team published the early access of NoSQL to InnoDB with memcached, plus several new features as part of MySQL 5.6.2 milestone release. This week, we announced additional early access to new InnoDB features for the community to test, and provide feedback.

There are two release packages from InnoDB team on MySQL Labs: InnoDB full-text search, and InnoDB new features.

InnoDB Full-Text Search

MySQL 5.5 makes InnoDB the default storage engine, so everyone can benefit from ACID-compliant transactions, referential integrity, crash recovery.  However, some users need InnoDB to have built-in full-text search, similar to MyISAM’s full-text search.

InnoDB full-text search provides users with the ability to build full text indices and search for specific text-based content stored in InnoDB tables.  This new functionality supports fast and accurate search on document content using natural language, boolean, wildcard, and proximity search.

The design and implementation of InnoDB full-text search can trace back to 2005, when Osku Salerma detailed the design in his master thesis “Design of a Full Text Search index for a database management system“. Later, Sunny Bains and Jimmy Yang from the InnoDB team took over the development and made major contributions to this important feature.

Jimmy gave an overview of InnoDB full-text search, and the main differences in design between InnoDB full-text search and MyISAM full-text search. John provided a set of examples in the tutorial. What about the performance of InnoDB full-text search, you can find out in Vinay and Jimmy’s article.

Please download mysql-5.6-labs-innodb-fts from MySQL Labs and give a try.

InnoDB New Features

The package mysql-5.6-labs-innodb-features on MySQL Labs consists of a set of InnoDB new features since MySQL 5.6.2 milestone release, except InnoDB full-text search. Some of the new features are already in MySQL server main development tree, and the rest of them are intended to move into the main development tree toward future development milestone releases and GA releases.

The new InnoDB features included in this package are:

  • Increase the max size of redo log files from 4GB to 2TB
  • Reduce contention during file extension
  • Make deadlock detection non-recursive
  • Improve thread scheduling
  • Change rw-lock to mutex for trx_sys_t
  • Option to preload InnoDB buffer pool
  • Allow UNDO logs to reside in their own tablespace
  • Reintroduce random readahead
  • Support smaller page sizes (4K & 8K)
  • Increase the max length of prefix index from 767 bytes to 3072 bytes

In additional to continue improvements of InnoDB performance and scalability, we are also focusing on optimizing InnoDB for flash drives. InnoDB with flash drives could benefit from new features such as larger REDO log files, separate UNDO logs, smaller page sizes, and preloaded buffer pool.

Group commit with binlog is released separately by MySQL replication team, also on MySQL Labs.

Want to learn the details of InnoDB new features? Download mysql-5.6-labs-innodb-features from MySQL Labs, play with it, and read the blogs from InnoDB engineers:

BTW, do not forget to send us feedback. Thanks for being interested in InnoDB!

InnoDB Full-Text Search Performance

Note: this article was originally published on on July 27, 2011 by Vinay.

Long awaited InnoDB support for full-text index is available in InnoDB lab release. This post talks about the performance of InnoDB’s full-text support over MyISAM’s full-text support. (There are multiple aspect of full-text index performance, the most important ones are full-text index creation time and query execution time). We are not focusing on performance with “DML” (expect some simple insert on loaded data) and “other supported character sets” (numbers are based on latin1 char-set data).

Numbers are encouraging in terms of ‘create index’  time where InnoDB is comparable or faster than MyISAM. And query execution time are comparable with MyISAM. Over the time, we can expect optimization for query performance.

Please refer  “Overview and Getting Started with InnoDB FTS” , “InnoDB Full-Text Search Tutorial” , as this blog focuses on full-text search performance part.

a) Full-Text Search Create Index Time
Full-text index creation time depend(s) on (the) number of words to be indexed which is indirectly dependent on the length of word which you want to index. By default , all words with length >= 3 bytes get indexed in InnoDB. (MyISAM default word length is 4).

When you are comparing ‘create index’ time then just make sure MyISAM and InnoDB are indexing the same number of words from the data. InnoDB uses “InnoDB_ft_min_token_size” variable to set the minimum word length. It is also dependent on max word length to be indexed but it is generally not changed. Default stop-word list for InnoDB contain less word than that of MyISAM stop-word list so it will index more word(s).

Looking at InnoDB full-text search architecture details , it is clear that usage of (user defined “FTS_DOC_ID”) column and server parameter “InnoDB_ft_sort_pll_degree” (could affect performance)  help to increase the performance. Let us see what do these parameter mean so we will get an idea why they are used/tuned to get better performance.

InnoDB full-text search requires a monotonically increasing “Doc ID” column for fulltext index. There is no requirement that user must add this column but InnoDB will add it automatically if the column is missing and rebuild of unique/primary index. From performance point of view , its good that user could pre-define this column saving rebuild of  index. As of now, it needs to be a 8 bytes unique non-null BIGINT column and it must have the reserved name “FTS_DOC_ID” (all upper case).

Example , User can  avoid a index rebuild by server , using “FTS_DOC_ID” column when creating the table
title varchar(255) NOT NULL DEFAULT ”,
text mediumtext NOT NULL,


Note: “FTS_DOC_ID” (reserved column name, all upper case) column and FTS_DOC_ID_INDEX (reserved index name) are added. These two are used by InnoDB internally, if either is missing, InnoDB will create them for you.

If you are not in a hurry, and don’t want to manage your own Doc ID, then you can just create table as following. Subsequent “create fulltext index” call will do a rebuild on the table and add FTS_DOC_ID column as hidden column, and FTS_DOC_ID_INDEX on FTS_DOC_ID.
title varchar(255) NOT NULL DEFAULT ”,
text mediumtext NOT NULL,

InnoDB full-text search has tokenization and sort phases which are handled using threads. “InnoDB_ft_sort_pll_degree” variable determine how many parallel threads to use for tokenization and sorting. It has been observed that ‘create index’ is faster with large values of InnoDB_ft_sort_pll_degree on multi-core machines.

Default value for “InnoDB_ft_sort_pll_degree” is 2 . This is a static variable so it required to be set at server start-up time and it will/can be moved as dynamic variable at later enhancement of InnoDB full-text search.
* Use higher value for InnoDB_ft_sort_pll_degree ( it takes values 2,4,8,16,..)

Following Numbers are obtained with setup as

Schema: Table with number of words (as defined above)
Index : (title, text) columns
Data Size : Approx 2.7GB, 1 million row , 238 million word
Platform: Linux x86 64 bit , 3Ghz , 8 cores , 32 GB RAM

‘Create Index’ Performance

Default 4 8 16
InnoDB 7 min 48.12 5 min 12.06 4 min 10.95 3 min 33.4
MyISAM 11min 42.57 11 min 32.27 11min 32.79 11 min 25.17

Note: “innodb_ft_sort_pll_degree” do not affect myisam , hence ‘create index’ time is almost same around 11 min.

One thing to note is that InnoDB full-text search create index use extensive merge sort. So on some system we see memory usage larger than expected, due to OS file system cache.  You can turn off such OS file system cache on Linux or Solaris by setting “innodb_disable_sort_file_cache” to TRUE, which enables direct I/O. However, doing such might hurt performance to the merge sort to some extent.

Its been observed that CPU utilization is in proportion to “innodb_ft_sort_pll_degree” values. ‘Create index’ in Innodb goes through phases called tokenization, merge sort and insert . CPU utilization is listed for each phase receptively along with Memory utilization for default(2) value of innodb_ft_sort_pll_degree. ( Note: CPU utilization values of different phases are snapshot values of that phase).


core 0 core 1 core 2 core 3 core 4 core 5 core 6 core 7
Default(2) CPU (%)

Tokenization 0 0 0 0 96.97 0 0 97.11

Merge Sort 20.23 0 0 1 0 46.56 0 1

Insert Data 26 24.14 36.14 28.14 37 36.68 39.7 31

Memory (%) 5.8%  across all core

(Variable = innodb_ft_sort_pll_degree )

Above scenario when repeated with utf8 character-set on table also showed improvement in ‘create index’ time with increased values of  innodb_ft_sort_pll_degree. Insertion of 100 records on top of 1M record was faster for Innodb than MyISAM. InnoDB inserted 100 records in 10 millisec where as MyISAM took 30 millisec.

b) Query Execution Time With Full-text Index
In this section, we briefly go over the execution time of some typical full-text search queries:

Queries used are :
1) select count(*) from wp where match(title,text) against (‘Creationist +Abrahamic -creationism’ in boolean mode);
2) select count(*) from wp where match(title,text) against (‘Abrahamic’);
3) select count(*) from wp where match(title,text) against (‘preassumtions +orangutan’ in boolean mode);
4) select count(*) from wp where match(title,text) against (‘orangutan +falsified ~naturalistically’ in boolean mode);
5) select count(*) from wp where match(title,text) against (‘+india* -leader +gandh*’ in boolean mode);
6) select count(*) from wp where match(title,text) against (‘”american culture”@09′ in boolean mode);

Query InnoDB (ms) * InnoDB FTS Search Processing Time MyISAM (ms)
1 10 2 10
2 130 1 10
3 1 1 10
4 2 2 10
5 320 116 1000
6 (New InnoDB Feature) 5250

* FTS Search processing time is the actual time spend by InnoDB full-text index scan.

There are still activities going to to synchronize the MySQL Optimizer and QP to the InnoDB FTS, so some queries types are not yet fully tuned. So there are further rooms to improve for various queries.


We see ‘create index’ performance scale well align with the number of parallel degree and overall comparable performance with MyISAM in terms of query execution time.

InnoDB Full-Text Search Tutorial

Note: this article was originally published on on July 25, 2011 by John Russell.

The InnoDB full-text search capability is an exciting feature. The full-text search itself is generally useful to have in an RDBMS. If an application is using all InnoDB tables except for one that is used for full-text searches, now that last table can be switched to InnoDB. If putting the full-text data in a MyISAM table led to scalability problems, duplication, or a less-than-ideal schema design, now those issues can be addressed.

In this post, I’ll take you through some of the basics of setting up and querying an InnoDB FULLTEXT search index. I’ll leave the scalability and performance aspects to Jimmy’s and Vinay’s blog posts, and just use some toy-sized data for demonstration purposes.

Creating a Table with a Full-Text Search Index

The key component of this feature is an index of type FULLTEXT, applied to one or more columns of an InnoDB table.

In Jimmy’s post, he mentions some scalability considerations where you might create the table (including a special FTS_DOC_ID column), load the data, then create the FULLTEXT index afterward. For simplicity (and since the data volume is so small), I’ll create the table with the index in place, then load the data afterward.

use test;
-- We will do some commits and rollbacks to demonstrate transactional features.
-- So turn off the default setting that commits immediately after each statement.
set autocommit=0;

drop table if exists quotes;
-- The InnoDB full-text search feature in the 5.6 Labs release
-- lets us define the FULLTEXT index on an InnoDB table.
-- The ENGINE=INNODB clause is just to illustrate the InnoDB aspect.
-- In 5.5 and above, new tables use InnoDB by default.

create table quotes
  (    id int unsigned auto_increment primary key
    , author varchar(64)    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
  ) engine=innodb;

-- Get some words and phrases to search for into the table.
insert into quotes (author, quote, source) values
  ('Abraham Lincoln', 'Fourscore and seven years ago...',
  'Gettysburg Address')
, ('George Harrison', 'All those years ago...',
  'Live In Japan')
, ('Arthur C. Clarke', 'Then 10 years ago the monolith was discovered.',
  '2010: The Year We Make Contact')
, ('Benjamin Franklin',
  'Early to bed and early to rise, makes a man healthy, wealthy, and wise.',
  'Poor Richard''s Almanack')
, ('James Thurber',
  'Early to rise and early to bed makes a male healthy and wealthy and dead.',
  'The New Yorker')
, ('K', '1500 hundred years ago, everybody knew that the Earth was the center of the universe.',
  'Men in Black')

-- Since this is an InnoDB table, we are mindful of transactions.

Word and Phrase Search – Natural Language Mode

Once the data is loaded and committed, you can run queries using the MATCH(columns) AGAINST (search expression) operator to do the actual searches. You can combine this operator with all the usual WHERE and similar clauses in the SELECT statement.

The simplest kind of search is to find a single word, or a phrase with all words in exact order. For this type of search, use the IN NATURAL LANGUAGE clause inside the AGAINST() call. This technique typically involves a user-entered string that you pass verbatim to the query (of course, after escaping any quotation marks or other special characters to prevent SQL injection attacks).

-- Search for a single word.
select author as "Monolith" from quotes
  where match(quote) against ('monolith' in natural language mode);
| Monolith         |
| Arthur C. Clarke |
1 row in set (0.01 sec)

-- Unlike with MyISAM full-text search, the default minimum
-- length is 3 rather than 4, and the search returns words that
-- appear in a high proportion of the table rows.
select author as "Ago" from quotes
  where match(quote) against ('ago' in natural language mode);
| Ago              |
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
4 rows in set (0.00 sec)

AND / OR / NOT Operators – Boolean Mode

For more complicated searches, you can have multiple words and phrases and search for different combinations of optional and required terms, not necessarily in the same order. This technique typically involves several data values that you query from elsewhere, or splitting apart a user-entered string and applying your own rules to the words and phrases inside.

-- Search for a combination of words, not in the same order as the original.
select author as "Ago and Years" from quotes
  where match(quote) against ('+ago +years' in boolean mode);
| Ago and Years    |
| Abraham Lincoln  |
| George Harrison  |
| Arthur C. Clarke |
| K                |
4 rows in set (0.00 sec)

-- Search for other Boolean combinations of words.
select author as "Fourscore or Monolith" from quotes
  where match(quote) against ('fourscore monolith' in boolean mode);
| Fourscore or Monolith |
| Abraham Lincoln       |
| Arthur C. Clarke      |
2 rows in set (0.00 sec)

select author as "Years and not Monolith" from quotes
  where match(quote) against ('+years -monolith' in boolean mode);
| Years and not Monolith |
| Abraham Lincoln        |
| George Harrison        |
| K                      |
3 rows in set (0.00 sec)

Proximity Search

Proximity search is new feature for InnoDB full-text search. It is a special case of Boolean search using the @ operator within the AGAINST() string. You supply 2 or more words, double-quoted, within the single-quoted AGAINST() string, followed by @distance to specify how far apart these words can be. The distance represents the maximum number of bytes (which might not be equal to the number of characters) between the starting points of all these words.

-- The starting points for these words are too far apart
-- (not within 20 bytes), so no results.
select quote as "Too Far Apart" from quotes
  where match(quote) against ('"early wise" @20' in boolean mode);
Empty set (0.00 sec)

-- But the starting points of all words are within 100 bytes,
-- so this query does give results.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @100' in boolean mode);
| Early...Wise                                                            |
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
1 row in set (0.00 sec)

-- In this case, the smallest distance that produces results is 49.
select quote as "Early...Wise" from quotes
  where match(quote) against ('"early wise" @49' in boolean mode);
| Early...Wise                                                            |
| Early to bed and early to rise, makes a man healthy, wealthy, and wise. |
1 row in set (0.00 sec)

-- Here is an example showing 2 results, with the words close to each other.
select quote as "Early...Bed" from quotes
  where match(quote) against ('"early bed" @20' in boolean mode);
| Early...Bed                                                               |
| Early to bed and early to rise, makes a man healthy, wealthy, and wise.   |
| Early to rise and early to bed makes a male healthy and wealthy and dead. |
2 rows in set (0.00 sec)

Relevance Ranking

The relevance ranking is fairly basic, derived from word frequencies within each document and the search data overall. Typically, you would only ORDER BY this value for very simplistic searches of small documents; for any important search you would layer your own ranking logic on top, perhaps with the MySQL relevance value as one factor in the overall rank.

-- Get the relevance of a single word.
select substr(quote,1,20) as "And",
  match(quote) against ('and' in natural language mode) as Relevance
  from quotes order by Relevance desc;
| And                  | Relevance          |
| Early to rise and ea | 0.2718571722507477 |
| Early to bed and ear | 0.1812381148338318 |
| Fourscore and seven  | 0.0906190574169159 |
| All those years ago. |                  0 |
| Then 10 years ago th |                  0 |
| 1500 hundred years a |                  0 |

6 rows in set (0.00 sec)


The key idea behind bringing full-text search to InnoDB tables is to make this feature compatible with transactions, so that you can include full-text columns alongside other columns in tables in ways that make sense in terms of schema design, and multiple sessions can update the full-text column data (and/or other columns in the table) simultaneously. The full-text data doesn’t have to be treated as read-only or read-mostly.

As mentioned in Jimmy’s blog post, the table structures that manipulate the full-text data behind the scenes are only updated at COMMIT time. So make sure to insert or update full-text data in one transaction, commit, and then run any full-text queries in a subsequent transaction. (Actually, in the examples below, it looks like the data is taken out of the full-text results as soon as a DELETE is issued, then comes back if the deletion is rolled back. I think that is explained in Jimmy’s blog post by the discussion about the delete-marking optimization to avoid huge updates to the full-text index for deleted data.)

drop table if exists quotes_uncommitted;

create table quotes_uncommitted
      author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(quote)
    , primary key (author, quote(128))

-- We insert but don't immediately commit.
insert into quotes_uncommitted select author, quote, source from quotes;
-- Within the same transaction, a full-text search does not see the uncommitted data.
select count(author), author as "Uncommitted Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Uncommitted Results |
|             0 | NULL                |
1 row in set (0.00 sec)

-- If the newly inserted rows are rolled back...
-- ...then the full-text search still doesn't see them.
select count(author), author as "Rolled-Back Results" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Rolled-Back Results |
|             0 | NULL                |
1 row in set (0.00 sec)

-- OK, let's start with some committed data in the table, then empty the table,
-- and then try some FTS queries
-- both before and after the commit.
insert into quotes_uncommitted select author, quote, source from quotes;
delete from quotes_uncommitted;
select count(author), author as "Deleted but still not committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Deleted but still not committed |
|             0 | NULL                            |
1 row in set (0.00 sec)

select count(author), author as "Deleted and rolled back" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Deleted and rolled back |
|             4 | Abraham Lincoln         |
1 row in set (0.00 sec)

delete from quotes_uncommitted;
select count(author), author as "Deleted and committed" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Deleted and committed |
|             0 | NULL                  |
1 row in set (0.00 sec)

insert into quotes_uncommitted select author, quote, source from quotes;
truncate table quotes_uncommitted;
select count(author), author as "Truncated" from quotes_uncommitted
  where match(quote) against ('ago' in natural language mode);
| count(author) | Truncated |
|             0 | NULL      |
1 row in set (0.00 sec)

Multi-Column Searches

Although you can only have one FULLTEXT index in an InnoDB table, that index can apply to multiple columns, allowing you to search when you aren’t sure which column contains the term. With a multi-column index, we can MATCH() against all the columns to find words that appear in any of those columns. Always reference all the same columns in the MATCH() clause as in the FULLTEXT index definition, because the information about which column the words appear in is not included in the full-text search data.

drop table if exists quotes_multi_col;

create table quotes_multi_col
    id int unsigned auto_increment primary key
    , author varchar(64)
    , quote varchar(4000)
    , source varchar(64)
    , fulltext(author, quote, source)

insert into quotes_multi_col select * from quotes;

select author as "Poor 1 (NL)", substr(quote,1,15) as "Poor 2 (NL)", source as "Poor 3 (NL)" from
  quotes_multi_col where match(author, quote, source)
  against ('poor' in natural language mode);
| Poor 1 (NL)       | Poor 2 (NL)     | Poor 3 (NL)             |
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
1 row in set (0.00 sec)

select author as "Poor 1 (BOOL)", substr(quote,1,15) as "Poor 2 (BOOL)", source as "Poor 3 (BOOL)"
  from quotes_multi_col where match(author, quote, source)
  against ('poor' in boolean mode);
| Poor 1 (BOOL)     | Poor 2 (BOOL)   | Poor 3 (BOOL)           |
| Benjamin Franklin | Early to bed an | Poor Richard's Almanack |
1 row in set (0.00 sec)

select author as "Clarke 1 (NL)", substr(quote,1,15) as "Clarke 2 (NL)", source as "Clarke 3 (NL)"
  from quotes_multi_col where match(author, quote, source)
  against ('clarke' in natural language mode);
| Clarke 1 (NL)    | Clarke 2 (NL)   | Clarke 3 (NL)                  |
| Arthur C. Clarke | Then 10 years a | 2010: The Year We Make Contact |
1 row in set (0.00 sec)

Interaction with Other Indexes

Remember that the design of your primary key index and secondary indexes is a big factor in query performance for InnoDB tables.

  • You can include parts (prefixes) of the full-text column(s) within the primary key.
  • However, that might not be a good idea if (a) the associated columns will ever be updated — which causes an expensive reorganization within the InnoDB table, or (b) if the table will have any other secondary indexes — the primary key values for a row are duplicated in the entry for that row in every secondary index, making index operations require more I/O and memory.
  • As mentioned in Jimmy’s blog post, adding the FULLTEXT index to the table is going to create a new column and associated index in the original table, so you could set up the column and index ahead of time, to avoid table reorganization later.
  • You can use the unique constraint of the primary key or a UNIQUE index to prevent duplicate values or combinations of values from being entered.
  • You can use the not-null constraint of the primary key to prevent blank values or combinations of values from being entered.
  • For the Labs release, the InnoDB FULLTEXT processing isn’t integrated with the MySQL optimizer and its estimates for which index is best to use, so don’t draw conclusions about performance characteristics from this early preview.


Stopwords are typically short, commonly used words that you designate as not significant for a search. They are left out of the FULLTEXT index and ignored when entered in FULLTEXT queries. For example, a search for ‘the’ is unsuccessful because it’s in the default stopword list. For your own customized search, you might create a bigger list (say, with common words from several languages) or a smaller one (for example, a music or movie site where words such as “The” in names and titles are significant). The details about customizing the stopword list are in Jimmy’s blog post.

Monday Jul 25, 2011

Improve InnoDB thread scheduling


Note: this article was originally published on on July 25, 2011 by Sunny Bains.

InnoDB has had the thread concurrency management code for some years now. Most will be familiar with the three configuration variables associated with this feature:
The problem with the existing code is that the queueing overhead becomes too much and negatively impacts performance, especially as the number of user threads goes up. The queueing code uses the os_event_t library to manage the queuing and dequeing of user threads in an explicit wait queue. This wait queue is implemented using strict FIFO. The FIFO constraint ensures that there is no thread starvation. To overcome this overhead one experimental feature that we are trying is to use busy polling of  free slots to enter InnoDB using sleep delays. The new scheme unlike the existing concurrency management scheme is not starvation free. Before people start complaining that event driven is better than polling, rest assured I know the theory. If there is a way to reduce the overhead and still keep it event driven, I’m very interested to see your code and experimental data :-) .  I’ve experimented with futexes too but no joy there either. It is reasonable to assume that I could have overlooked some better technique therefore feedback is important. In theory if the sleep delay can be tuned to exactly the amount required for a slot to be free then there should be perfect scheduling. This as we know is never going to be the case. However, we can try to get a good enough approximation that varies with the load. This will reduce the optimal or peak TPS due to the overhead of sleeping while there is potentially a slot that is empty. However, for applications that have lots of threads, greater than say 256 it will be able to maintain a higher TPS at the higher thread count because it doesn’t suffer from the queuing bottleneck. The results have been very encouraging in our internal experiments on high end hardware, hosts with >= 16 cores. There are some issues that need ironing out, in particular Sysbench OLTP read-only tests on 8 core hardware. This feature uses atomics to manage the thread concurrency slots.

New configuration variables
  1. innodb_adaptive_sleep_delay - This is a boolean variable that is used to enable/disable the adaptive sleep delay feature.
  2. innodb_adaptive_max_sleep_delay - This variable sets the upper bound of the max sleep delay in micro-seconds that is possible when adaptive sleep delay is enabled.
How adaptive sleep delay works

It is a very simple algorithm, in fact too simple and that is deliberate. I tried various feedback mechanisms but none matched the RW results of this simple algorithm. At one stage I even toyed with the idea of going down the subsumption architecture path :-) .
The basic idea behind it is that we try and reduce sleep as fast as we can and increase sleep as slowly as we can. The reasoning is that this makes it adapt quicker to a dynamic load. The code is rather simple and is all contained in the function srv_conc_enter_innodb_with_atomics(), see srv0conc.c.
  • if a thread has waited for more than one iteration increment sleep delay by one.
  • if a free slot found and there are no waiting threads sleeping then halve the sleep delay.
This works surprisingly well on hardware with >= 16 cores with the innodb_thread_concurrency set to the number of cores on the hardware. For read-write loads it works well on 8 core hardware too but has issues on 8 core hardware for Sysbench read-only tests. This last issue is under investigation.

Below you can find charts for results obtained on two boxes with 8 and 24 cores. These results highlight difference between two cases –  innodb_thread_concurrency=0 and innodb_thread_concurrency=<N_CPU> along with innodb_adaptive_sleep_delay=1

box1:  8 cores(2.33GHz)/16GB/RAID10
box2: 24 cores(2.66Ghz)/32GB/RAID10
MySQL 5.6.3 – lab-release
(You can find settings/test run details at the end of the post)


To use this new feature you simply set the innodb_thread_concurrency to the available number of cores and enable innodb_adaptive_sleep_delay and run your favourite benchmark software against it. As mentioned earlier there is an issue with RO tests on 8 core hardware, this issue is being looked at. If there is an easy way (and fast) way to make it starvation free then that will be added. However, it performs quite well in RW tests. Your feedback is important to fine tune this feature, please do share with us.

MySQL/sysbench settings

cmd line for prepare:
sysbench –test=oltp –oltp-table-size=1000000 –oltp-dist-type=uniform \
–oltp-table-name=sbtest –mysql-user=root –mysql-db=sbtest \
–mysql-host= –mysql-port=3306 –mysql-table-engine=INNODB  prepare

cmd line for warmup:
mysql -uroot -h127.0.0.1 -P3306 -e’check table sbtest’

cmd line for run:
sysbench –num-threads=<16..1536> –test=oltp –oltp-table-size=1000000 \
–oltp-dist-type=uniform –oltp-table-name=sbtest  –report-interval=1 –forced-shutdown=1 \
–max-requests=0 –max-time=120 –mysql-host= \
–mysql-user=root –mysql-port=3306 –mysql-db=sbtest
–mysql-table-engine=INNODB  <test options(see below)> run

parameters for OLTP_RO test:
parameters for OLTP_RW test:
parameters for UPDATE_KEY test:
–oltp-test-mode=nontrx –oltp-nontrx-mode=update_key run
parameters for POINT_SELECT test:
–oltp-point-selects=1 –oltp-simple-ranges=0 –oltp-sum-ranges=0 \
–oltp-order-ranges=0 –oltp-distinct-ranges=0 –oltp-skip-trx=on –oltp-read-only=on
parameters for SELECT_SIMPLE_RANGES test:
–oltp-point-selects=0 –oltp-simple-ranges=1 –oltp-sum-ranges=0 \
–oltp-order-ranges=0 –oltp-distinct-ranges=0 –oltp-skip-trx=on –oltp-read-only=on
parameters for SELECT_SUM_RANGES test:
–oltp-point-selects=0 –oltp-simple-ranges=0 –oltp-sum-ranges=1 \
–oltp-order-ranges=0 –oltp-distinct-ranges=0 –oltp-skip-trx=on –oltp-read-only=on

This post was co-authored with Alexey Stroganov (a.k.a Ranger).

Allow UNDO logs to reside in their own tablespace

Note: this article was originally published on on July 25, 2011 by Sunny Bains.


The InnoDB  UNDO entries reside in a special system table called the UNDO log. This log is made up of several segments. These segments are called rollback segments. A segment in InnoDB is similar to what a file would be in a file system,e.g., user tables and indexes are also stored as separate segments within the same tablespace,  only their format is different. In that sense there is nothing special about InnoDB UNDO logs. This feature allows storing of the UNDO log across several tablespaces.


UNDO logs  contain the before image of modified records. There are two types of UNDO records, one for insert and another for updates. The insert UNDO records can be discarded on transaction rollback. The update records are used for rollback, MVCC and by purge. It is because of purge that we can’t just remove the UNDO log records  once the UNDO logs are not referenced by any running transaction’s snapshot view (for MVCC). When a transaction is started it is assigned a rollback segment in a round robin scheme. Multiple transactions can be assigned the same rollback segment to write their changes. Up to a maximum of 1024 transactions can use a the same rollback segment. If you have more rollback segments then there is less contention around the rollback segment mutex.


The purge thread(s) run in the background and they read the UNDO log records from oldest to the latest, up to but not including the oldest active snapshot view in the system. It parses the UNDO log records and for entries that represent delete operations it uses the stored index key to search for the records in both the secondary and primary index and purges the entries, modifying the index tree structure if required. Normal DML operations simply delete mark the records but they don’t physically purge them, unless it is an insert that is being rolled back. This is to avoid expensive tree modifying operations in DML code. Once purge is finished with the UNDO entries it then truncates the UNDO log up to where it has processed the entries. For MVCC user transactions we need to follow the DATA_ROLL_PTR pointer to the UNDO log to build a previous version of the row.

Why did we make this change

All this activity results in lots of RANDOM IO. In order to reduce the  of random IO this feature helps by allowing the user to place the UNDO logs anywhere in the file system. This flexibility can be used to place them on an SSD for example.

New configuration variables

  1. innodb_undo_dir – This should be set to the directory where all the configured UNDO tablespaces will be created. Can be an absolute path too. Default is “.”.
  2. innodb_undo_tablespaces – This should be set to the number of UNDO tabespaces that you want to created. Note: This must be set at the time of database creation. Once created this cannot be changed.
  3. innodb_undo_logs (renamed from innodb_rollback_segments) – This should be set to the minimum number of rollback segments that you want. More can be added later but they cannot be removed. However, setting it to a smaller value at runtime will ensure that only the number set will be used when assigning to transactions.

There are a few gotchas here that you should be aware off. The innodb_undo_tablespaces can be set only at the time of database creation and the value must be the same for the lifetime of the database. More UNDO tablespaces cannot be added and existing tablespaces cannot be dropped.  We may add better UNDO tablespace management in the future.

Case 1. 128 segments spread across 128 segments.

  • innodb_undo_logs = 128
  • innodb_undo_tablespaces = 127 (we always create one log in tablespace 0, this is to avoid breaking the system tablespace layout)

Case 2. Start with one rollback segment but may want to create more later and spread them across say 4 UNDO tablespaces

  • innodb_undo_tablespaces = 4
  • innodb_undo_logs = 1

This will create the UNDO tablespaces but they will be empty, the single rollback segment that is created in tablespace 0 will work. Later if you want to add more tablespaces. You will need to shutdown the server and set the innodb_undo_logs variable to the number that you want. Lets say we add 2 more spaces. We would restart the server with the following parameters:

  • innodb_undo_tablespaces = 4
  • innodb_undo_logs = 3  - This will create two new rollback segments in the UNDO tablespace 1 and UNDO tablespace 2. Also this will disable the use of the rollback segment in tablespace 0.

If there were some UNDO records in rollback segment 1 they will be processed as normal. Only change is that no new ones will be written to rollback segment 1. It will be skipped when assigning rollback segments to new transactions.

Future work

Mapping of tablespace id to tablespace name is something that we would like to add so that users can change the path of individual tablespaces instead of all being lumped together in innodb_undo_dir. Managing these extra tablespaces is also something that we think is important, especially the adding and dropping of UNDO tablespaces including better management of the UNDO segments.


As we have seen  writing to the UNDO logs and accessing them can result in a lot of random IO. One way to reduce that overhead is to place the UNDO tablespaces on SSD. This feature allows the placement of the UNDO logs anywhere on the file system in a directory that is controlled by the configuration variable innodb_undo_dir. Note: this change is not backward compatible, databases created with this feature cannot be opened by older versions of InnoDB.  The main problems are that the old code doesn’t open these new system tablespaces before applying REDO and secondly there are some checks that assume space id can only be 0. The file format is not the problem as such.

Shortened warm-up times with a preloaded InnoDB buffer pool

Note: this article was originally published on on July 25, 2011 by Vasil Dimov.

Are you running an InnoDB installation with a many-gigabytes buffer pool(s)? Does it take too long before it goes back to speed after a restart? If yes, then the following will be interesting to you.

In the latest MySQL 5.6 Labs release we have implemented an InnoDB buffer pool(s) dump and load to solve this problem.

The contents of the InnoDB buffer pool(s) can be saved on disk before MySQL is shut down and then read in after a restart so that the warm up time is drastically shortened – the buffer pool(s) go to the state they were before the server restart! The time needed for that is roughly the time needed to read data from disk that is about the size of the buffer pool(s).

Lets dive straight into the commands to perform various dump/load operations:

The buffer pool(s) dump can be done at any time when MySQL is running by doing:

  mysql> SET innodb_buffer_pool_dump_now=ON;

This operation is very quick and creates a file named ib_buffer_pool in the InnoDB data directory, by analogy with the default InnoDB data file ibdata1. The file contains only the space and page IDs and is about 500k per gigabyte of buffer pool. The same operation can also be done automatically during MySQL shutdown:

  mysql> SET innodb_buffer_pool_dump_at_shutdown=ON;

By analogy the buffer pool(s) can be loaded manually by:

  mysql> SET innodb_buffer_pool_load_now=ON;

and automatically during startup from the file ib_buffer_pool by setting


in my.cnf or my.ini.

The filename where the buffer pool(s) are dumped and later loaded from can be specified by changing the value of the global configuration variable innodb_buffer_pool_filename which defaults to ib_buffer_pool.

Additionally the progress of the buffer pool(s) dump (although very quick) and load (not so quick) can be monitored by:

  mysql> SHOW STATUS LIKE 'innodb_buffer_pool_dump_status';
  mysql> SHOW STATUS LIKE 'innodb_buffer_pool_load_status';

And finally the load operation can be interrupted by:

  mysql> SET innodb_buffer_pool_load_abort=ON;

Before loading the list is sorted so that pages will be read in as close to sequential order as possible. The load is asynchronous so it should not interfere with normal operations too much. The load is clever enough to handle different buffer pool sizes or pages that are missing during the load. A page will not be loaded if it is already in the buffer pool so there is no need to worry about problems with changed pages.

The dump file is plain text so it is possible to edit it, though we do not recommend this.


Overview and Getting Started with InnoDB FTS

Note: this article was originally published on on July 25, 2011 by Jimmy Yang.

This Summer’s lab release includes an important feature of InnoDB – InnoDB Fulltext Search. This feature would greatly enhance InnoDB’s capability in Text search and retrieving. Since the feature is designed for our transactional storage engine, its underlying architecture design and implementation are completely different with those of MyISAM. So it is worth to give a brief technology review of this feature, familiarize users with some important concepts in the InnoDB FTS so that they can better utilize this feature.

There are a few other posts on the subject. John Russell will give a brief tutorial on the InnoDB fulltext search command and syntax. I will also discuss some comparison with MyISAM fulltext search in another post. And Vinay in our server testing will give some performance number from his experiments.

The Design:

To begin with, I will go over briefly on some key design concepts, which would help you better understand the feature.

  • Inverted Index – Like many DBMS fulltext search engines, InnoDB Fulltext Search (FTS) is also designed as an “inverted index”, in which, the incoming text is tokenized into individual words, and these words would be stored in one or more auxiliary table(s). For each word, a list of Document ID and the word position pair are stored. We call such Document ID/Position pair list as “ilist”. So in our inverted index table, two important columns are “word” and “ilist” columns. There is an index on the word column. As you might note, since we store the word position (as in byte offset) for each word, so we do support “Proximity Search”, which has been lacking in the MyISAM FTS.
  • The FTS “Index Table (Auxiliary tables) – As mentioned, words in the inverted index are stored in a set of (auxiliary) tables. And in our case, such tables are called FTS “Index Tables”. And our Fulltext Indexes are mostly made up of this set of tables that stores the Word to Document mapping. In our design, the FTS “Index Table” is “partitioned” into multiple tables. So instead of a single auxiliary table, there are now six of them, each contains words partitioned according to their first character. The partition criteria is currently hard coded, targeted to the Latin encoding. But we will eventually make the partition user definable, so user can define different partition scheme with different Charsets.

The partitioned design makes it easier for us to parallelize the operation. Currently, we only parallelize the create index operation, but apparently we can extend such to query and fulltext search later on.

  • The FTS “Index Cache” – Although our Fulltext Index are represented by a set of “Index Auxiliary Tables” on disk, there does exist a FTS “Index Cache” to batch the result before flush/sync them to the “Index Table”. The “Index Cache” is actually a “red-black tree” that stores the Word-Doc ID/Position (ilist) information. However, this cache is different from a index tree in that it is only used to cache the word of the inserted documents, however, words on disk do not need to be brought back to this cache before they can be queried. In another word, the words in the “Index Table” are queried directly as we query into relational tables. And such result from Index Table will merge(union) with the result from the Index Cache before sending back to user.The “batching” by FTS “Index Cache” alleviates InnoDB from frequent updates to the “Index Table” if there are busy inserts and updates, so make the interaction to the index table only at the sync time when the index cache is full. Another benefit for the “batching” is that it can minimize the number of entries for each word in the “Index Table”. So instead of flushing each word with a single “ilist” entry after tokenizing one document, and making an entry for such word in the “Index Table”, we could create a longer “ilist” with multiple DocID/position pairs from different documents, before we flush this info to disk and also make a single entry. This reduces the redundancy and make the “Index Table” smaller.
  • The FTS “Document ID” – Another important concept that might affect the usability is the Document ID management. As for most FTS engine, the mapping from word to Document has to go through a unique (Document) ID. In our case, it is represented by a “FTS_DOC_ID” column that we automatically added to the user table. The column itself must be of “BIGINT UNSIGNED NOT NULL” type. And a unique index with FTS_DOC_ID_INDEX will be created on top the column. On the other hand, we do not prohibit user from adding such column before hand by themselves, in this case, user will be responsible to properly manage the Doc ID column so that document will not be inappropriately represented.

So above are some simple concepts to get us into the next step. Now we can start to use the Fulltext Search.

Use the InnoDB Fulltext Index

1. The FTS DDL and Query Syntax

We continue to use MySQL/MyISAM FTS query syntax. So if you are familiar with MySQL FTS syntax, you can directly get your hands on InnoDB FTS. All types of MyISAM search are supported, except we added the proximity search to the “Boolean Search”. John Russell will have a blog on the syntax and usage in his “Tutorial on FTS” blog, so I would leave that to him.

Instead, I will talk about more on some special points that we need to know when using InnoDB FTS.

2. Create index:

In general, we recommend load the table with data first, then create FTS index on top of it. This would be much faster than you create the FTS and then try to insert document into it. We had added the Fast Index Creation (FIC) parallel sort support for creating Fulltext search index. And this actually allows you to tokenize, sort and create FTS index in parallel.

To control the parallel sort degree, you can use the new “innodb_ft_sort_pll_degree” system configure variable (default 2, and max 32). This is used to specify how many ways you want to parallelize the tokenization and sort operations. And experiment shows on a system that is not I/O bound, the create index performance scales with the number of CPUs and “innodb_ft_sort_pll_degree”.

Following table shows a quick run on 2.7 G wikipedia data on a 8 core machine:

Server Time (min)
MyISAM 11 min 47.90
InnoDB (default) 7 min 25.21 sec
InnoDB pll_degree 5 min 34.98
InnoDB pll_degree = 8 4 min 9 sec
InnoDB pll_degree = 16 3 min 39.51 sec

Another trick on creating index is regarding the Doc ID handling. If a user does not supply a “FTS_DOC_ID” column, InnoDB will add a new hidden column to your table, this results a rebuild of cluster index along with all secondary index, which could be costly. If user could include the “FTS_DOC_ID” in his/her original table define, then it could save a cluster index (table) rebuild. To do so, user would need to include a column with name “FTS_DOC_ID” (all upper case) in his/her table. The column must be of “BIGINT UNSIGNED NOT NULL” datatype. It does not need to be an auto_increment column, but this could make life easier.

An example would be:

1) CREATE TABLE fts_test (


title VARCHAR(200),

body TEXT


The unique “FTS_DOC_ID_INDEX” index on “FTS_DOC_ID” is also optional, without it, InnoDB will automatically create it.:


3) Load data

4) CREATE FULLTEXT INDEX idx on fts_test (title, body);

Please note, the column name “FTS_DOC_ID” has to be in upper case as the column name in InnoDB is case sensitive. In the next release, we’ll relax the column name requirement for the “FTS_DOC_ID” column, so that user can specify any column name for this ID column.

2. DMLs

- FTS Index Updated on Commit:

Like Fulltext search index for all Transactional DBMS, the update to the Fulltext search index has always been an issue. A typical example is the Oracle Text, which has been synchronizing its data to FTS periodically or manually. Starting in 11g, it now allows user to specify when the data update is reflected to its Fulltext index – manually, on commit, or at regular interval. For InnoDB Fulltext search, the FTS index is updated on transaction commit time. So the document tokenization and inserting into FTS index (cache) happens only after you commit the transaction. However, let’s not confuse this delayed update on FTS index with data handling. All the data manipulations still follow their original rules. Just the update to FTS index is done at transaction commit time.

- How insert is handled

The inserted document will be tokenized at the commit time and inserted into the “Index Cache”. This cache has a configurable size (“innodb_ft_cache_size”) with default of 32 MB. Once this index is full, it will be “sync-ed” to on-disk “Index Tables”.

During server normal shutdown, the content in the “Index Cache” will be “sync-ed” to the “Index Tables” on disk. However, if there is a server crash, and the content of “Index Cache” did not reflected to the “Index Table”, after the server reboot, and when you first time use the FTS index on the table (for search or insert), the “missing” documents will be read from the original table and re-tokenized, add to the “Index Cache”.

- How delete is handled

While inserted documents are being tokenized and inserted into FTS “Index Cache” at commit time, we do not delete any entries for deleting documents if such entries are already in the FTS index table. Instead, we record the deleted Doc ID in a “DELETED” auxiliary table. And for each query, we will also consult this “DELETED” table to filter out any deleted Documents. With this design, the delete operation could be simple and fast, without the need to update thousands of word entries for the deleting document. The downside is that the word entries are not removed from FTS index, so the index could grow. And this is resolved by “Index Optimization”.

4. Index Optimization:

As just discussed, if there are active DMLs on the Fulltext index, the index could be “bloated” over time. We do not do run time deletion on index entries in the index table, instead, we just log the deleted doc ID in a “DELETED” auxiliary table. So over time the index table itself could grow bigger even if you actually removing documents. To resolve this, we can “optimize” the index. This operation does two things:

1) Remove the deleted Doc ID from the word’s DocID/Position pair list (ilist)

2) Consolidate multiple entries for the same word to one (or less) entries if possible by consolidating their DocID/Position pair list (ilist)

Currently, we overload the optimization operation with “Optimize Table” command, and runs only if you turn on “innodb_optimize_fulltext_only” system configure variable:

mysql> set global innodb_optimize_fulltext_only=1;

mysql> optimize table articles;

Since the table could be large, and optimization could take tremendous amount of time, we allow the optimization to be done in stages. There is a system configure variable “innodb_ft_num_word_optimize” specifying how many words to optimize for each “optimize table” command, its default value is 2000, which mean each time it optimize 2000 words. And when the next “optimize table” command is issued, the server will continue from the word it left last time, and continue the process.

In fact, this optimization process could be run as an online, background process, but it is not enabled in this release. We will continue evaluate these options to see if which way is more appropriate and gives the minimum user impact.

5. Stopword Handling:

For FTS stopwords, we provide two sources of stopwords:

1) Default stopwords provided by the server – this is a static list of stopword come with server. If no user stopword is defined, then this default stopword list will be used.


mysql> select * from INNODB_FT_DEFAULT_STOPWORD;

  1. User defined stopwords – User can define his/her own stop word by creating an user table with a single column with a column name of “value” and of the “varchar” datatype, and use global variable “innodb_ft_server_stopword_table” to point to this stopword list. Server will load stopword from this user table, rather than the default stopword list when we create the FTS index. And an example will be here:
  • # Define a correct formated user stopword table

create table user_stopword(value varchar(30)) engine = innodb;

  • # Point to this stopword table with “db name/table name”

set global innodb_ft_server_stopword_table = “test/user_stopword”;

6. Ranking:

In this initial release, we use a very simple ranking mechanism (term frequency, inverse document frequency) for document result ranking. This means the more the word appears in a document, and the less frequent the word appears in overall documents, the higher the selected document is ranked. We do have plans to add more sophisticated ranking mechanism and allow users to supplied the weight in the ranking calculation too.

7. Some Restrictions:

Lastly, we talk about a couple of restrictions for our initial release:

  1. InnoDB FullText Search now only support one FTS index per Table.
  2. Same as MyISAM, although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation.
  3. For ideographic languages such as CJK (Chinese, Japanese and Korea) do not have word delimiters, similar to MyISAM, we do not yet support N-GRAM parsing. But this is on our feature agenda.


In summary, InnoDB fulltext search gives InnoDB an important capability in Text File handling. And this newly added platform is already showing its scalability in indexing and flexibility in DML handling.  We are also considering bring the parallelism to the Query and DML operations to further enhance its performance. However, by saying that, it is still in an early stage where we would continue the development effort, adding features and options, tuning the performance and to make it a competitive feature.

Difference between InnoDB FTS and MyISAM FTS

Note: this article was originally published on on July 25, 2011 by Jimmy Yang.

With this lab release, InnoDB now has its own Fulltext Search (FTS) support. People might wonder how different is this FTS comparing with that of MyISAM. From the usage standpoint, they seem very similar with no apparent difference. However, in the underlying FTS engine, the difference is huge, since two are built on different storage engines with totally different design. So in some aspect such as DML, it might be difficult to make it an apple to apple comparison. However, we do try to list and explain the difference, so to make it easier for user to understand our design and make the transition to use InnoDB FTS easier.

The Design:

The most significant difference is the design. Like most transaction storage engine, InnoDB stores the inverted index in auxiliary relational tables. All queries and DMLs eventually route to these tables. On the other hand, MyISAM stores the inverted list in a two-level B-Tree. The first level contains records of word and its count, and the second level contains records of this word’s weight and the rowid. So it is more of a special B-tree “index”.

However, the MyISAM design would not be suitable for highly concurrent transaction storage engine like InnoDB. And its special structure limits its extensibility, as it still can’t include the position information in its indexing to support proximity search.

For InnoDB, by storing the inverted index in auxiliary tables, it renders all the benefit of relational table manipulations to the FullText index. It already applied the partition concept so that multiple tables are used to handle inverted index. This not allows the create index to be parallelized, it also allow its query can be parallelized in the future. So InnoDB design apparently wins in terms of future extensibility and scalability.

The Query Syntax:

Even though the underlying FTS design is different, InnoDB continue uses the same syntax defined MySQL Server. The create index clause and almost all Query Syntax are remain to be the same. Of course, InnoDB now supports the proximity search, but it does so by overloading the existing boolean search with “@” symbol:

# Proximity search, search two word “following” and “comparison” within 19 bytes distance

SELECT * FROM articles

WHERE MATCH (title,body)

AGAINST (‘”following comparison”@19′ IN BOOLEAN MODE);

Creating FTS index:

Even though we have the same create index clause as that of MyISAM, and the index can be created on top of loaded table, or created as part of table, we do recommend creating index after the data loaded to the table. In such way, it can utilize our parallel create index machinery and fully exploit the scalability of our parallel tokenization and parallel sort option.

User can tune such parallel create index by adjusting the value of “innodb_ft_sort_pll_degree”. Currently, it is only a server boot time variable, we could easily make it dynamic tunable.

As mentioned in an earlier blog for “Overview and Getting Started with InnoDB FTS”, another special point worth mentioning is the InnoDB Doc ID column (FTS_DOC_ID). This column stores the Document ID that is used to uniquely identify each document. If user do not supply such column, InnoDB will create a hidden column with this name. This normally means a cluster index rebuild. User can normally avoid so by suppling their own “FTS_DOC_ID” column. A typical example would be:

1) CREATE TABLE fts_test (


title VARCHAR(200),

body TEXT


The unique “FTS_DOC_ID_INDEX” index on “FTS_DOC_ID” is also optional, without it, InnoDB will automatically create it.:


3) Load data

4) CREATE FULLTEXT INDEX idx on fts_test (title, body);


MyISAM supports real time update, this is one of its good points. However, this comes with MyISAM’s limited DML concurrency support. InnoDB’s behavior is more consistent with other transactional storage engines such as Oracle, whose FTS update can be done at commit time, periodically or manually.

For InnoDB FTS, the update of FTS index is done at the commit time. Such delayed update means additional step to fetch the data again for tokenization at commit time, thus increasing the processing time. However, such design is necessary to ensure the concurrent multi-session DMLs to proceed as efficiently as usual. And importance of continuing this concurrency far overweights the restrictions required by the MyISAM to ensure its real time update.

So this is a typical example showing the difference between supporting FTS in a transactional, high concurrency database engine and doing so on a non-transactional database like MyISAM. To some aspect, we cannot compare them in an apple to apple fashion. It would be more reasonable to compare InnoDB FTS with that of Oracle Text in these areas.

InnoDB “Index Cache” and crash recovery

InnoDB FTS also has an “index cache” structure to cache the inserting results. And once this cache is full, the cache content will be flushed to the disk onto auxiliary index tables. This is a unique structure for InnoDB FTS. And InnoDB also provide a configure variable “ innodb_ft_cache_size” to control the size of it. The content of this index cache will also flushed to disk on normal shutdown.

If a server crashed before any of index cache content is sync-ed to the on disk index table, next time after server reboot, and when the FTS index is used first time or there is an insertion/update into the table, the missing content from last time crash will be reinstated into the index cache.

InnoDB FTS Index “Optimization”

Another special aspect of InnoDB is how it handles document deletion. If there is delete, the deleted document word is not immediately removed from the Fulltext index. Instead, its Doc ID is registered in a “DELETED” table. Because of this, the FTS index could be “bloated” over time. To resolve this, we can “optimize” the index using the “optimize table” command, when system configure variable “innodb_optimize_fulltext_only” is turned on:

mysql> set global innodb_optimize_fulltext_only=1;

The optimize process can take a long time, to avoid the process blocking index access for a long period of time, there is another configure system variable “innodb_ft_num_word_optimize” letting you specify the number of words to optimize each time. And the subsequent calls to “optimize table” will continue from where it has been left, and continue the optimize process.

Stopword handling

MyISAM has a large default stopword list, so it can create a smaller FTS index. In addition, it excludes any words that appear in more than 50% of documents. Such word exclusion could actually create unnecessary confusion. InnoDB FTS don’t have such restrictions, and our default stopword list is much smaller. You can view the default stopword list by select from an information schema table:


InnoDB FTS also support user supplied stopword. User could provide their stopword through a table with a single “varchar” column. And then tells the InnoDB about the stopword table through a system configure variable “innodb_ft_server_stopword_table”:

# Define a correct formated user stopword table

create table user_stopword(value varchar(30)) engine = innodb;

# The set operation should be successful

set global innodb_ft_server_stopword_table = “test/user_stopword”;

Other Miscellaneous items:

1) New InnoDB Information Schema Table

InnoDB FTS adds a few Information Schema tables for diagnostic purpose, and they might of interest to user if they want to explore the content of some internal auxiliary tables:

  1. INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE” displays the in memory “Index Cache” content.
  2. INFORMATION_SCHEMA.INNODB_FT_DELETED” displays the “DELETED” table content.
  3. INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE” displays the on disk “Index table” content.

Before being able to select from these table, user would need to tell InnoDB which table’s FTS index he/she is interested. It is achieved by specifying the “innodb_ft_aux_table” configure variable with the name of the table:

mysql> set global innodb_ft_aux_table=”test/articles”;


System configure variables:

Except for various system variables we already mentioned, there are a couple of other configure variables:

innodb_ft_min_token_size – minimum word size in bytes we would tokenize, default value is 3

innodb_ft_max_token_size – maximum word size in bytes we would tokenize, default value is 84

innodb_ft_num_word_optimize – number of words we optimize for each optimize operation

Please note, we do not take MyISAM’s configure variable such as “ft_min_word_len”, “ft_max_word_len”, “ft_stopword_file”  etc.


In summary, InnoDB FTS keeps most of its query syntax in consistent with that used by MySQL/MyISAM, so user do not need to learn any additional command. However, the internal design of InnoDB FTS is marketed different from that of MyISAM, thus user should pay attention to such design difference and expect possible behavior difference as mentioned in this articles.

Create InnoDB databases with 4k and 8k page sizes without recompiling

Note: this article was originally published on on July 25, 2011 by Kevin Lewis.

One of the features found in the summer 2011 labs release is the ability to select the InnoDB page size without recompiling.  Smaller page sizes may be useful for certain storage media such as SSDs where there is no need to minimize seek time between reads.

A new global setting called innodb-page-size can be set to 4k, 8k or 16k before creating a new MySQL instance. This sets the page size for all tablespaces used by that InnoDB instance.   This can be done in my.cnf or on the mysqld command line.  It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace.  That happens when InnoDB does not find ibdata1 in the data directory.  If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start.

A few bugs were found and fixed related to smaller page sizes in InnoDB tablespaces. Those bug fixes can also be found in the 2011 summer labs release.

The mysql-test-run can be run like this to test a smaller page size;

perl –mysqld=–innodb-page-size=4k –suite=innodb –force

Other features of InnoDB work the same with smaller page sizes but some limits are affected.  While each of the 4 row formats; Redundant, Compact, Compressed and Dynamic are supported, the maximum record size is less.  Record length limits are calculated within InnoDB based on a bunch of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size.  Since the main record is stored in a clustered index, a minimum of two records must fit in each page.  So the maximum record length for 8k pages is about half that of 16k pages and the max record size with 4k pages is half that of 8k pages.

If you have a database schema with any large records, you may not be able to use smaller page sizes.  Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper.  For example, if only 2 records fit on the page and there are  1,000,000 records, leaf pages are 20 levels down meaning InnoDB will need to read and search 20 pages to find the leaf page.  If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.

There is a trick to reducing the size of records on the clustered index page.  VARCHAR fields can be converted to TEXT fields. (And VARBINARY can be converted to BLOB.)  InnoDB tries to put as much of VARCHAR fields in the clustered record as it can.  Normally, it will put as much as 767 bytes which is the index prefix limit on long fields.  But this labs release also contains a feature to increase that limit to 3072 bytes.  So if you have VARCHAR fields that use hundreds or thousands of bytes, they can make the clustered record very long.  TEXT columns, like BLOB columns, are stored in BLOB pages.  Only 20 bytes is needed in the record to link to that data.  Long VARCHAR fields are split at 767 bytes (or 3072 bytes if innodb_large_prefix=ON) and that 20 bytes is added to the clustered record to link to the rest of those fields.  If your VARCHAR(5000) field rarely ever uses more than 767 bytes, it will rarely be split up.  Still, you might get better performance by reducing the depth of the btree if you define long columns as TEXT instead of VARCHAR.

A file extent in InnoDB is 1 Mb independent of the page size.  So an extent will hold 64  16k pages, 128 8k pages and 256 4k pages.  This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time.  The doublewrite buffer, which is based on the size of an extent, will also contain more pages.

If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump.  Then create the new mysql instance with innodb-page-size=4k or 8k and import the data.  Do not use a physical export method such as alter table … discard tablespace.


This feature makes it easier to try smaller page sizes in an InnoDB database. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL.  A new InnoDB instance will be created with the smaller page size.  Then you can import your data and run your tests, all without recompiling InnoDB.

Reduced contention during datafile extension

Note: this article was originally published on on July 25, 2011 by Inaam Rana.

InnoDB has an internal file system management module that primarily manages the space in the data files. One of the pain points was the coarse level of locking used when a data file has to be extended. More about this issue can be found here. In the latest labs release we have fixed this problem.

When we need to extend a data file inside InnoDB we write zero filled pages synchronously to the file. The user thread which is extending the data file holds fil_system::mutex during the whole operation. This mutex covers changes to all data structures related to file system management. What this means is that when we do a regular IO we do need to acquire fil_system::mutex though only for a short time. Because the thread doing the data file extension is holding the mutex during the whole IO operation any other thread (user or background) trying to access data file for regular read or write ends up waiting. This brings the whole system to a virtual stand still as no read or write activity can happen. This is true even if a thread is trying to access a data file that is not the one being extended.

We fixed this issue by introducing an internal flag to the data structure indicating that a file is being extended. Now if a user thread needs to extend a data file it does acquire the fil_system::mutex but releases it after setting the flag. Once it is done with the extension IO it resets the flag. This allows other threads to access data files while one of the file is being extended. This also allows multiple files to be extended in parallel. Our tests have shown that the issue of stalls due to file extension is indeed fixed by this approach.

A related feature which can be considered as future work is to off load the file extension to a background thread.

Reintroducing Random Readahead in InnoDB

Note: this article was originally published on on July 25, 2011 by Inaam Rana.

Yes, we are reintroducing random readahead. We took the random readahead code out after our internal benchmarking and community feedback showed that random readahead tends to adversely affect the performance. However, some recent feedback from customers have shown that for some specific workloads random readahead can be good. Based on this information we have decided to reintroduce random readahead albeit with a dynamic switch to turn it on or off. By default, random readahead is not enabled. You can turn it on by setting innodb_random_read_ahead = TRUE.

The term ‘random readahead’ is a bit of an oxymoron. Readahead seems more related to sequential access pattern. To clarify this I’ll try to explain how InnoDB internally triggers a readahead request. Data pages in InnoDB are laid out on the disk in chunks of 64 pages called extents. The decision about readahead is essentially about whether it makes sense to read in the entire extent instead of reading in the pages one by one as and when they are requested by the user. There are two types of readahead mechanisms inside InnoDB. There are some obvious and some subtle differences between the two.

Linear readahead is the one that decides whether or not to read in the ‘next’ extent and the decision is made based not only on the number of pages of the ‘current’ extent present in the buffer pool but also the access pattern. The ‘current’ extent means the extent which is being accessed currently by the user threads. Whenever a user thread tries to access a page in the buffer pool the linear readahead mechanism is triggered. We first see if it is a boundary page of an extent or not. If it is a boundary page we figure out how many pages of this extent are present in the buffer pool and what is their access pattern. If there are more than innodb_read_ahead_threshold pages in the buffer pool and the access pattern is sequential and in the right order, InnoDB will queue asynchronous read requests for the entire next extent.

Random readahead is concerned mainly with the ‘current’ extent. The evaluation is done on each page read (note the difference from linear readahead where conditions are evaluated on every page access). On every page read, if innodb_random_read_ahead is set we try to see how many pages of this extent are in the buffer pool. If we have a certain number of pages and all of them have been accessed recently i.e.: they are closer to the MRU end of the LRU list we deduce that the remaining pages of the extent will be accessed soon as well. Asynchronous read requests for the remaining pages in the current extent are queued.

To cap the difference between the two types of readahead mechanisms, in case of linear readahead the decision is made about next extent, the conditions are evaluated on each page access, the access pattern is evaluated and decision is made if and only if we are accessing a boundary page of an extent. In case of random readahead the decision is made about current extent, the conditions are evaluated only at page read, instead of access pattern just the recentness of access is taken into consideration and the decision is not tied to the page being read in being a boundary page.

Friday May 06, 2011

InnoDB blogs, slides, etc.

Note: this article was originally published on on May 6, 2011 by Calvin Sun.

Last month was one of the busiest months for the InnoDB team:

The slides of the three talks are now available:

In addition to the blogs from the InnoDB team, Dimitri published a five-part series on MySQL 5.6 Performance, all on DimitriK’s (dim) Weblog:

  1. MySQL Performance: 5.6 Notes, part 5 – Fixing Adaptive Flushing..
  2. MySQL Performance: 5.6 Notes, part 4 – Fixing Purge issue..
  3. MySQL Performance: 5.6 Notes, part 3 – More in depth..
  4. MySQL Performance: 5.6 Notes, part 2 – Under full dbSTRESS workload…
  5. MySQL Performance: 5.6 Notes, part 1 – Discovery…

As usual, Dimitri’s blogs are full of nice charts, detail analysis, and sometimes critiques. In part 3, he talked about his experience on InnoDB METRICS table, a new feature in MySQL 5.6.2 release.

Øystein Grøvlen, a member of MySQL optimizer team, published an article on InnoDB Persistent Statistic: InnoDB Persistent Statistics Save the Day, also a new feature in MySQL 5.6.2 release.

The blogs written by InnoDB team:

  1. MySQL 5.6: Multi threaded purge, by Sunny Bains
  2. Tips and Tricks for Faster DDL, by Marko Mäkelä
  3. MySQL 5.6: Data dictionary LRU, by Sunny Bains
  4. Introducing page_cleaner thread in InnoDB, by Inaam Rana
  5. Information Schema for InnoDB System Tables, by Jimmy Yang
  6. InnoDB Persistent Statistics at last, by Vasil Dimov
  7. MySQL 5.6: InnoDB scalability fix – Kernel mutex removed, by Sunny Bains
  8. NoSQL to InnoDB with Memcached, by Calvin Sun
  9. Get started with InnoDB Memcached Daemon plugin, by Jimmy Yang
  10. Get started with InnoDB Metrics Table, by Jimmy Yang

Wednesday Apr 13, 2011

Get started with InnoDB Metrics Table

Note: this article was originally published on on April 13, 2011 by Jimmy Yang.

In our 5.6 release, we introduced a new feature – “InnoDB Metrics Table” as part of our effort to extend InnoDB diagnostic and monitoring capability. The “Metrics Table” feature consolidates all InnoDB related Performance and Resource related counters (as of today, 176 counters) and makes them available through an information schema table. In addition, it gives the user the ability to control these counters for their monitoring need.

In following sections, we will go over the feature in detail and focus on its usage to get you started with “metrics counters”.

1) The user interface – InnoDB Metrics Table

Before this feature, InnoDB already had dozens of performance counters. They are exposed to the user through “MySQL System Variables”. These counters are permanent counters, and there are no external controls on them. However, they had become important probes that allow users to peek into the server running state and make appropriate decisions to tune the system to its best state.

In an effort to extend our system monitoring capability, we decided to consolidate the monitoring mechinery, and created a dedicated monitoring module. And the interface to the monitor counters will be a relational table, so users can perform SQL queries on the result. And thus we have the “information_schema.innodb_metrics” table as the interfaces of the “Metrics Table” feature.

The table has following 17 columns:

1.  NAME             string              counter name
2.  SUBSYSTEM        string              the module or the feature the counter pertains to
3.  COUNT            int64               value since the counter is enabled
4.  MAX_COUNT        int64(nullable)     max value since the counter is enabled
5.  MIN_COUNT        int64(nullable)     min value since the counter is enabled
6.  AVG_COUNT        int64(nullable)     average since the counter is enabled
7.  COUNT_RESET       int64              counter value since last reset operation
8.  MAX_COUNT_RESET   int64 (nullable)   max value since last reset
9.  MIN_COUNT_RESET   int (nullable)     min value since last reset
10. AVE_COUNT_RESET   float(nullable)    avg value since last reset
11. TIME_ENABLED      timestamp (nullable) timestamp of last start
12. TIME_DISABLED     timestamp (nullable) timestamp of last stop
13. TIME_ELAPSED      int64 (nullable)     time elapsed since counter started (in seconds)
14. TIME_RESET        timestamp (nullable) timestamp of last reset
15. STATUS            string             whether the counter is running or stopped
16. TYPE              string             describes whether the counter is incremental
                                         or resource counter
17. COMMENT           string             counter description

The columns are pretty much self explanatory. The most basic ones are the “NAME” and “COUNT” columns, which gives you the counter name and its value.

2) The Control Mechanism

In addition to allowing user to view counters through relational tables, we added four simple operation options to allow the user to control the counters, these options are:

  1. Enable Counting- this is to start the monitor counter.
  2. Disable Counting – this is to stop the monitor counter.
  3. Reset – this is to reset the monitor counter to zero while the counter still in the “enabled” mode. Notice we have “COUNT” column and  “COUNT_RESET” column. “Reset” operation only resets the “COUNT_RESET” column.
  4. Reset All, this operation has to be preceded by a “Disable” operation. It will reset all values, including both “COUNT” and  “COUNT_RESET” columns.

Above 4 operations are achieved through MySQL set option on 4 option variables:

  • Enable – To enable or turn on the counter, use “innodb_monitor_enable”:

mysql> set global innodb_monitor_enable = counter_name;

  • Disable  – To disable or turn off the counter, use “innodb_monitor_disable”:

mysql> set global innodb_monitor_disable = counter_name;

  • Reset – To reset the counter, use “innodb_monitor_reset”:

mysql> set global innodb_monitor_reset = counter_name;

  • Reset all – To reset all the values, use innodb_monitor_reset_all:

mysql> set global innodb_monitor_reset_all = counter_name;

Wild card match for counter name:

Note that you would have to state the counter name for above operations. To make things easy,  we support wildcard match for the “counter name”. So the user does not need to remember the exact name for the counter, he/she would just need to include at least one wildcard match character “%”in the counter_name string:

For example, if you would like to turn on monitor “dml_inserts”

You can use “dml_inserts” directly or simply use “dml_i%” or “dml_%”, the later will turn on other counters such as “dml_deletes” too:

1) mysql> set global innodb_monitor_enable = dml_inserts;

Query OK, 0 rows affected (0.00 sec)

2) mysql> set global innodb_monitor_enable = “dml_%”

Check the counter, notice its status is in the “enabled” state:

mysql> select * from information_schema.innodb_metrics where name="dml_inserts" \G
*************************** 1. row ***************************
           NAME: dml_inserts
      SUBSYSTEM: dml
          COUNT: 0
      MAX_COUNT: 0
      AVG_COUNT: 0
   TIME_ENABLED: 2011-02-14 19:20:15
         STATUS: enabled
           TYPE: status_counter
        COMMENT: Number of rows inserted
1 row in set (0.01 sec)

Or if you want to turn on all monitors, you can simply use “%” for “counter_name”

mysql> set global innodb_monitor_reset_all = “%”;

Query OK, 0 rows affected (0.00 sec)

Finally, there are a few rules regarding Wildcard match with counter names.

  • Wildcard match only applies to individual counters and does not applies to modules (a concept we will discuss later).
  • At least one “%” should appear in the match string for the server to switch to wildcard match. “%” matches multiple character and “_” matches single character. However if the string contains only “_” but with no “%”, wildcard match will not be turned on.
  • As shown above, single or multiple “%”  with no other characters means all counters are selected.
  • User can supplies wildcard match string in the server configure file too: loose-innodb_monitor_enable=”lock%” (turn on monitors in server configure file is discussed in the next section)

Turn on monitor counters with server configure files or startup time options:

We also support turning on monitor counters through server configure file. Please use “innodb_monitor_enable” option to specify the monitor counters/groups you want to turn on at the server start up. The count name can be separated by “;”, “,” or space. For example, follow option turns on monitor counter “metadata_table_opened” and all counters start with “lock” prefix:


Monitor Counter Modules:

Monitor counters are grouped in modules. And each monitor counter prefixed with its module name. For example, counter “dml_inserts” and “dml_deletes” and “dml_updates” all belong to the “DML” module. So far, we had about 19 modules are defined:

1 module_metadata
2 module_lock
3 module_buffer
4 module_buf_page
5 module_os
6 module_trx
7 module_purge
8 module_log
9 module_compress
10 module_index
11 module_adaptive_hash
12 module_table_space
13 module_change_buffer
14 module_innodb
15 module_file
16 module_ibuf_system
17 module_dml
18 module_ddl
19 module_icp

We created these modules to make the counters more manageable. You can apply these module names directly to the 4 set options defined above, and the action will apply to all counters in the module. For example:

set global innodb_monitor_enable = “module_dml”

3) Examples

Let’s finish this quick tutorial with some examples. Let’s continue to look at the “dml_inserts” counter:

1) Enable the dml_insert counter

mysql> set global innodb_monitor_enable = "dml_i%"

2) Insert three rows, and check the counters again:

mysql> insert into test values(9);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values(10);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values(11);
Query OK, 1 row affected (0.00 sec)
mysql>  select * from information_schema.innodb_metrics where
name="dml_inserts" \G
*************************** 1. row ***************************
           NAME: dml_inserts
      SUBSYSTEM: dml
          COUNT: 3
      MAX_COUNT: 3
      AVG_COUNT: 0.030303030303030304
   TIME_ENABLED: 2011-02-14 19:20:15
         STATUS: enabled
           TYPE: status_counter
        COMMENT: Number of rows inserted
1 row in set (0.00 sec)

3) Reset – Let’s reset the counter, notice we could also specify “module_dml” to reset all counters in DML module

mysql> set global innodb_monitor_reset = module_dml;
Query OK, 0 rows affected (0.00 sec)

Let’s select only related columns, notice only “count_reset” and “max_count_reset” are reset, but not the “count” and “max_count” values:

    -> from information_schema.innodb_metrics
    -> where name="dml_inserts" \G
*************************** 1. row ***************************
           NAME: dml_inserts
          COUNT: 3
      MAX_COUNT: 3
   TIME_ENABLED: 2011-02-14 19:20:15
         STATUS: enabled
1 row in set (0.00 sec)

While the monitor counter is “enabled”, we cannot use “innodb_monitor_reset_all” to reset all values, but we can apply “innodb_monitor_reset”, for which only “COUNT(MAX_COUNT)_RESET” would be reset to 0. If you do use “innodb_monitor_reset_all” while the counter is in enabled state, the following message will be printed to error log:

mysql>  set global innodb_monitor_reset_all = module_dml;
Query OK, 0 rows affected (0.00 sec)
InnoDB: Cannot reset all values for a monitor counter while it is on. Please
turn it off and retry.

4) Only after you disable the monitor counter, this reset_all command can be used:

mysql>  set global innodb_monitor_disable  = module_dml;
Query OK, 0 rows affected (0.00 sec)
mysql>  set global innodb_monitor_reset_all = module_dml;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from information_schema.innodb_metrics  where
*************************** 1. row ***************************
           NAME: dml_inserts
      SUBSYSTEM: dml
          COUNT: 0
         STATUS: disabled
           TYPE: status_counter
        COMMENT: Number of rows inserted
1 row in set (0.01 sec)
While the monitor is in the "disabled" state, its counting activity would be
completely stopped:
mysql>  insert into test values(9);
Query OK, 1 row affected (0.00 sec)
mysql>  insert into test values(9);
Query OK, 1 row affected (0.01 sec)
mysql> select * from information_schema.innodb_metrics  where
*************************** 1. row ***************************
           NAME: dml_inserts
      SUBSYSTEM: dml
          COUNT: 0
         STATUS: disabled
           TYPE: status_counter
        COMMENT: Number of rows inserted
1 row in set (0.01 sec)

One last point before we finish. There are some counters that are really not behave as a “counter”. Most of them are resource related and display the amount of resource we uses. The total number of buffer pages is a good example. For these counters, we only display the current value, and in the “type” column, it will display as “current_value” instead of “counters”:

mysql> set global innodb_monitor_enable = buffer_pool_total_page;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.innodb_metrics where name =
*************************** 1. row ***************************
           NAME: buffer_pool_pages_total
      SUBSYSTEM: buffer
          COUNT: 512
      MAX_COUNT: 512
      MIN_COUNT: 512
    COUNT_RESET: 512
   TIME_ENABLED: 2011-02-14 19:20:15
         STATUS: enabled
           TYPE: value
        COMMENT: Total buffer pool size in pages (innodb_buffer_pool_pages_total)


Monday Apr 11, 2011

Get started with InnoDB Memcached Daemon plugin

Note: this article was originally published on on April 11, 2011 by Jimmy Yang.

As Calvin mentioned in “NoSQL to InnoDB with Memcached“, we just released a “technology preview” of the feature that makes memcached a MySQL Daemon Plugin. And this “technology preview” release demonstrates how user can go around SQL Optimizer and Query Processing and directly interact with InnoDB Storage Engine through InnoDB APIs. Here, we would like to walk with you step by step to see how to get the memcached Daemon Plugin set up and get it running.

If you would just like to get a brief introduction on the setup steps, there is a “README-innodb_memcached” in the mysql-5.6.2-labs-innodb-memcached package. This is a more elaborated description on these steps.

1) Prerequisite:

Currently, the Memcached Daemon Plugin prototype is only supported on Linux platform. And as a prerequisite, you must have libevent installed, since it is required by memcached.

If you have the source code release, then there is a libevent 1.4.3 included in the package (plugin/innodb_memcached/libevent). You can go to the directory and do autoconf, ./configure, make and make install to make the libevent installed.

2) Build the server

Assuming you would like to build the server yourself (otherwise, you can just skip this section), once libevent is installed, you can just build the MySQL server as usual.

Our source code is in the “plugin/innodb_memcached” directory. As part of server build, it will generate two shared libraries:

1) this is the memcached daemon plugin to MySQL

2) this is an InnoDB API plugin to memcached

Make sure above two shared libraries are put in the MySQL plugin directory. You can find MySQL plugin directory by doing “select @@plugin_dir”:

mysql> select @@plugin_dir;
| @@plugin_dir                                                           |
| /home/jy/work2/mysql-5.6-memcached-labs/lib/plugin |
1 row in set (0.00 sec)

3) Install configuration tables:

Next, the memcached plugin needs a bit configuration to know how to interact with InnoDB table. We have a configuration script in “scripts/innodb_memcached_config.sql”. You can just install the necessary configure tables by running  “mysql < scripts/innodb_memcached_config.sql”. If you do not like to know the detail of these configuration tables, you can just skip this section.

This configure script installs 3 tables needed by the InnoDB Memcached. These tables are created in a dedicated database “innodb_memcache”. We will go over these three tables in  a bit more detail:

1) “containers” – This table is the most important table for “Memcached – InnoDB mapping”. It describes the table used to store the memcached values. Currently, we can only map memcached to one table at a time. So essentially, there will be only one row in the table. In the future, we would consider making this configuration more flexible and dynamic, or user can map memcached operations to multiple tables.

The mapping is done through specifying corresponding column values in the table:

  • “db_schema” and “db_table” columns describe the database and table name for storing the memcached value.
  • “key_columns” describes the column (single column) name for the column being used as “key” for the memcached operation
  • “value_columns” describes the columns (can be multiple) used as “values” for the memcached operation. User can specify multiple columns by separating them by comma (such as “col1, col2″ etc.)
  • “unique_idx_name_on_key” is the name of the index on the “key” column. It must be a unique index. It can be primary or secondary.

Above 5 column values (table name, key column, value column and index) must be supplied. Otherwise, the setup will fail.

Following are optional values, however, to fully comply with memcached protocol, you will need these column values supplied too.

  • “flags” describes the columns used as “flag” for memcached. It also used as “column specifier” for some operations (such as incr, prepend) if memcached “value” is mapped to multiple columns. So the operation would be done on specified column. For example, if you have mapped value to 3 columns, and only want the “increment” operation performed on one of these columns, you can use flags to specify which column will be used for these operations.
  • “cas_column” and “exp_column” are used specifically to store the “cas” and “exp” value of memcached.

2. Table “cache_policies” specifies whether we’ll use InnoDB as the data store of memcached (innodb_only) or use memcached’s “default engine” as the backstore (cache-only) or both (caching). In the last case, only if the default engine operation fails, the operation will be forwarded to InnoDB (for example, we cannot find a key in the memory, then it will search InnoDB).

3) Table “config_options”, currently, we only support one config option through this table. It is the “separator” used to separate values of a long string into smaller values for multiple columns values. For example, if you defined “col1, col2″ as value columns. And you define “|” as separate, you could issue following command in memcached to insert values into col1 and col2 respectively:

set keyx 10 0 19


So “valuecol1x” will send to col1 and valuecoly will send to col2.

4) Example tables

Finally, as part of the configuration script, we created a “demo_test” in the “test” database as an example. It also allows the Daemon Memcached to work out of box, and no need to for any additional configurations.

As you would notice, this “demo_test” table has more columns than needed, so it would need the entries in the “container” table to tell which column is used for what purpose as described above.

4) Install the Daemon Plugin

The final step would be installing the daemon plugin. It is the same as installing any other MySQL plugin:

mysql> install plugin daemon_memcached soname “”;

If you have any memcached specific configure parameters, although it takes effect when the plugin is installed, you would need to specify it during MySQL  server boot time or enter them in the MySQL configure files.

For example, if you would like the memcached to listen on port “11222″ instead of the default port “11211″, then you would need to add “-p11222″ to MySQL system configure variable “daemon_memcached_option”:

mysqld …. –loose-daemon_memcached_option=”-p11222″.

Of course, you can add other memcached command line options to “daemon_memcached_option” string.

The other configure options are:

1) daemon_memcached_engine_lib_name (default “”)

2) daemon_memcached_engine_lib_path (default NULL, the plugin directory).

By default, you do not need to set/change anything with these two configure option. We have above two configure options because they allow you to load any other storage engine for memcached (such as NDB memcached engine). This opens door for more interesting exploration.

3) daemon_memcached_r_batch_size, batch commit size for read operation (get operations. It specifies after how many ops we will do a commit. By default, this is set as a very large number, 1048576.

4) daemon_memcached_w_batch_size, batch commit for any write operations (set, replace, append, prepend, incr, decr etc.) By default, this is set as 32.

Again, please note that you will have these configuration parameter in your MySQL configure file or MySQL boot command line. And they will take effect when you load the memcached plugin.

5) Start to play:

Now you have everything set up, you can start to play:

telnet 11211

set a11 10 0 9
get a11
VALUE a11 0 9

You can access the InnoDB table (“demo_test”) through the standard SQL interfaces. However, there are some catches:

1) If you would like to take a look at what’s in the “demo_test” table, please remember we had batched the commits (32 ops by default) by default. So you will need to do “read uncommitted” select to find the just inserted rows:

-> read uncommitted;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from demo_test;
| cx   | cy   | c1   | cz   | c2        | ca   | CB   | c3   | cu   | c4   | C5   |
| NULL | NULL | a11  | NULL | 123456789 | NULL | NULL |   10 | NULL |    3 | NULL |
1 row in set (0.00 sec)

2) The InnoDB table would be IS (shared intention) or IX (exclusive intentional) locked for all operations in a transaction. So unless you change “daemon_memcached_r_batch_size” and “daemon_memcached_w_batch_size” to small number (like 1), the table is most likely intentionally locked between each operations. So you cannot do any DDL on the table.


Now you have everything setup. And you can directly interact with InnoDB storage engine through Memcached interfaces. In addition, as you might notice while going through this extended “README”, we still have a lot interesting options open for exploration and enhancement. This is the beginning of  opening InnoDB to the outside world, and NoSQL is a perfect place for it to play.


This is the InnoDB team blog.


« July 2016