Saturday Sep 21, 2013

InnoDB 5.7 performance improvements

A quick overview of the InnoDB performance improvements for both read-only and read-write loads.
[Read More]

Saturday Sep 29, 2012

Performance Enhancement in Full-Text Search Query

Ever since its first release, we are continuing consolidating and developing InnoDB Full-Text Search feature. There is one recent improvement that worth blogging about. It is an effort with MySQL Optimizer team that simplifies some common queries Query Plans and dramatically shorted the query time. I will describe the issue, our solution and the end result by some performance numbers to demonstrate our efforts in continuing enhancement the Full-Text Search capability.

The Issue:

As we had discussed in previous Blogs, InnoDB implements Full-Text index as reversed auxiliary tables. The query once parsed will be reinterpreted into several queries into related auxiliary tables and then results are merged and consolidated to come up with the final result. So at the end of the query, well have all matching records on hand, sorted by their ranking or by their Doc IDs.

Unfortunately, MySQLs optimizer and query processing had been initially designed for MyISAM Full-Text index, and sometimes did not fully utilize the complete result package from InnoDB.

Here are a couple examples:

Case 1: Query result ordered by Rank with only top N results:

mysql> SELECT FTS_DOC_ID, MATCH (title, body) AGAINST ('database')

AS SCORE FROM articles ORDER BY score DESC LIMIT 1;

In this query, user tries to retrieve a single record with highest ranking. It should have a quick answer once we have all the matching documents on hand, especially if there are ranked. However, before this change, MySQL would almost retrieve rankings for almost every row in the table, sort them and them come with the top rank result. This whole retrieve and sort is quite unnecessary given the InnoDB already have the answer.

In a real life case, user could have millions of rows, so in the old scheme, it would retrieve millions of rows' ranking and sort them, even if our FTS already found there are two 3 matched rows. Apparently, the million ranking retrieve is done in vain. In above case, it should just ask for 3 matched rows' ranking, all other rows' ranking are 0. If it want the top ranking, then it can just get the first record from our already sorted result.

Case 2: Select Count(*) on matching records:

mysql> SELECT COUNT(*) FROM articles WHERE MATCH (title,body)

AGAINST ('database' IN NATURAL LANGUAGE MODE);

In this case, InnoDB search can find matching rows quickly and will have all matching rows. However, before our change, in the old scheme, every row in the table was requested by MySQL one by one, just to check whether its ranking is larger than 0, and later comes up a count.

In fact, there is no need for MySQL to fetch all rows, instead InnoDB already had all the matching records. The only thing need is to call an InnoDB API to retrieve the count

The difference can be huge. Following query output shows how big the difference can be:

mysql> select count(*) from searchindex_inno where match(si_title, si_text) against ('people') 

+----------+
| count(*) |
+----------+
| 666877 |
+----------+
1 row in set (16 min 17.37 sec)

So the query took almost 16 minutes.

Lets see how long the InnoDB can come up the result. In InnoDB, you can obtain extra diagnostic printout by turning on innodb_ft_enable_diag_print, this will print out extra query info:

Error log:

keynr=2, 'people'
NL search
Total docs: 10954826 Total words: 0
UNION: Searching: 'people'
Processing time: 2 secs: row(s) 666877: error: 10
ft_init()
ft_init_ext()
keynr=2, 'people'
NL search
Total docs: 10954826 Total words: 0
UNION: Searching: 'people'
Processing time: 3 secs: row(s) 666877: error: 10

Output shows it only took InnoDB only 3 seconds to get the result, while the whole query took 16 minutes to finish. So large amount of time has been wasted on the un-needed row fetching.

The Solution:

The solution is obvious. MySQL can skip some of its steps, optimize its plan and obtain useful information directly from InnoDB. Some of savings from doing this include:

1) Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to sort to get top matching results.

2) Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list should all have ranking of 0, and no need to be retrieved. And InnoDB has a count of total matching records on hand. No need to recount.

3) Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and ranking is needed, there is no need to go to user table to fetch the record itself.

4) Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch all matching records from user table. We should be able to first select TOP N matching DOC IDs, and then only fetch corresponding records with these Doc IDs.

Performance Results and comparison with MyISAM

The result by this change is very obvious. I includes six testing result performed by Alexander Rubin just to demonstrate how fast the InnoDB query now becomes when comparing MyISAM Full-Text Search.

These tests are base on the English Wikipedia data of 5.4 Million rows and approximately 16G table. The test was performed on a machine with 1 CPU Dual Core, SSD drive, 8G of RAM and InnoDB_buffer_pool is set to 8 GB.

Table 1: SELECT with LIMIT CLAUSE

mysql> SELECT si_title, match(si_title, si_text) against('family') as rel FROM si WHERE match(si_title, si_text) against('family') ORDER BY rel desc LIMIT 10;


InnoDB

MyISAM

Times Faster

Time for the query

1.63 sec

3 min 26.31 sec

127

You can see for this particular query (retrieve top 10 records), InnoDB Full-Text Search is now approximately 127 times faster than MyISAM.

Table 2: SELECT COUNT QUERY

mysql>select count(*) from si where match(si_title, si_text) against('family‘);

+----------+
| count(*) |
+----------+
| 293955 |
+----------+


InnoDB

MyISAM

Times Faster

Time for the query

1.35 sec

28 min 59.59 sec

1289

In this particular case, where there are 293k matching results, InnoDB took only 1.35 second to get all of them, while take MyISAM almost half an hour, that is about 1289 times faster!.

Table 3: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms

mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10;

Term

InnoDB (time to execute)

MyISAM(time to execute)

Times Faster

family

0.5 sec

5.05 sec

10.1

family film

0.95 sec

25.39 sec

26.7

Pizza restaurant orange county California

0.93 sec

32.03 sec

34.4

President united states of America

2.5 sec

36.98 sec

14.8


Table 4: SELECT title and text with ORDER BY and LIMIT CLAUSE for selected terms

mysql> SELECT <ID>, si_title, si_text, ... as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10;

Term

InnoDB (time to execute)

MyISAM(time to execute)

Times Faster

family

0.61 sec

41.65 sec

68.3

family film

1.15 sec

47.17 sec

41.0

Pizza restaurant orange county california

1.03 sec

48.2 sec

46.8

President united states of america

2.49 sec

44.61 sec

17.9

Table 5: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms

mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel  FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10;

Term

InnoDB (time to execute)

MyISAM(time to execute)

Times Faster

family

0.5 sec

5.05 sec

10.1

family film

0.95 sec

25.39 sec

26.7

Pizza restaurant orange county califormia

0.93 sec

32.03 sec

34.4

President united states of america

2.5 sec

36.98 sec

14.8


Table 6: SELECT COUNT(*)

mysql> SELECT count(*) FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) LIMIT 10;

Term

InnoDB (time to execute)

MyISAM(time to execute)

Times Faster

family

0.47 sec

82 sec

174.5

family film

0.83 sec

131 sec

157.8

Pizza restaurant orange county califormia

0.74 sec

106 sec

143.2

President united states of america

1.96 sec

220 sec

112.2

 Again, table 3 to table 6 all showing InnoDB consistently outperform MyISAM in these queries by a large margin. It becomes obvious the InnoDB has great advantage over MyISAM in handling large data search.

Summary:

These results demonstrate the great performance we could achieve by making MySQL optimizer and InnoDB Full-Text Search more tightly coupled. I think there are still many cases that InnoDB’s result info have not been fully taken advantage of, which means we still have great room to improve. And we will continuously explore the area, and get more dramatic results for InnoDB full-text searches.

Jimmy Yang, September 29, 2012

Helping to Reduce Page Compression Failures Rate

When InnoDB compresses a page it needs the result to fit into its predetermined compressed page size (specified with KEY_BLOCK_SIZE). When the result does not fit we call that a compression failure. In this case InnoDB needs to split up the page and try to compress again. That said, compression failures are bad for performance and should be minimized.

Whether the result of the compression will fit largely depends on the data being compressed and some tables and/or indexes may contain more compressible data than others. And so it would be nice if the compression failure rate, along with other compression stats, could be monitored on a per table or even on a per index basis, wouldn't it?

This is where the new INFORMATION_SCHEMA table in MySQL 5.6 kicks in. INFORMATION_SCHEMA.INNODB_CMP_PER_INDEX provides exactly this helpful information. It contains the following fields:

+-----------------+--------------+------+
| Field           | Type         | Null |
+-----------------+--------------+------+
| database_name   | varchar(192) | NO   |
| table_name      | varchar(192) | NO   |
| index_name      | varchar(192) | NO   |
| compress_ops    | int(11)      | NO   |
| compress_ops_ok | int(11)      | NO   |
| compress_time   | int(11)      | NO   |
| uncompress_ops  | int(11)      | NO   |
| uncompress_time | int(11)      | NO   |
+-----------------+--------------+------+

similarly to INFORMATION_SCHEMA.INNODB_CMP, but this time the data is grouped by "database_name,table_name,index_name" instead of by "page_size".

So a query like

SELECT
database_name,
table_name,
index_name,
compress_ops - compress_ops_ok AS failures
FROM information_schema.innodb_cmp_per_index
ORDER BY failures DESC;
would reveal the most problematic tables and indexes that have the highest compression failure rate.

From there on the way to improving performance would be to try to increase the compressed page size or change the structure of the table/indexes or the data being stored and see if it will have a positive impact on performance.

Wednesday Apr 11, 2012

InnoDB performance improvements

Note: this article was originally published on http://blogs.innodb.com on April 11, 2012 by Sunny Bains.

The problem
After making several performance fixes, notable among them being the kernel mutex split and the new handling of read-only transaction and in particular non-locking auto-commit read-only transactions, we weren’t seeing any increase in transaction per second (TPS) on our high-end hardware. On this one particular host, a 24 core with 2 threads per core host. The TPS using Sysbench was a tepid 5.6K at 16 threads and more or less plateaued till 1K user threads. No matter what config setting we used, we would more or less end up with the same result.

We ended up getting together for a meeting at Paris to discuss this issue and during the brain storming, one of the potential issues that cropped up was the effect of cache coherence and/or false sharing. After using the excellent Linux tool perf we were able to narrow it down to a global statistic counter in row_sel_search_for_mysql(). Mikael Ronstrom explains this in more detail.

The solution
Create a generic counter class (InnoDB code is now C++) that splits the counter into multiple (configurable) slots that are on separate 64 byte cache lines. Use the thread id of the updating thread to index into a slot to reduce the contention/sharing and it had the desired effect. The TPS went from 5.6 to 15K at 64 user threads and stayed close to stable right up to 1K, very slow degradation. This was using Sysbench OLTP_RO for autocommit-non-locking-read-only queries (Sysench option –oltp-skip-trx=off).

The code and binary can be downloaded from labs release downloads, the current release is mysql-5.6.6-labs-april-2012-*. See the code in include/os0thread.h. The new class is ib_counter_t.

We have now refactored the code and grouped all the InnoDB statistic counters in srv_counter_t. This will help in further consolidation and improvements. Currently, most of the InnoDB config and statistics variables are defined in srv0srv.cc (with a few exceptions). We need to start paying even more attention to their layout and alignment from now on. There seem to be some false sharing issues that we haven’t completely identified yet.

Results
I think it is better to look at Dimitri’s blog for results that reflect the improvements.

Enjoy!

Wednesday Jul 27, 2011

InnoDB Full-Text Search Performance

Note: this article was originally published on http://blogs.innodb.com on July 27, 2011 by Vinay.

Long awaited InnoDB support for full-text index is available in InnoDB lab release. This post talks about the performance of InnoDB’s full-text support over MyISAM’s full-text support. (There are multiple aspect of full-text index performance, the most important ones are full-text index creation time and query execution time). We are not focusing on performance with “DML” (expect some simple insert on loaded data) and “other supported character sets” (numbers are based on latin1 char-set data).

Numbers are encouraging in terms of ‘create index’  time where InnoDB is comparable or faster than MyISAM. And query execution time are comparable with MyISAM. Over the time, we can expect optimization for query performance.

Please refer  “Overview and Getting Started with InnoDB FTS” , “InnoDB Full-Text Search Tutorial” , as this blog focuses on full-text search performance part.

a) Full-Text Search Create Index Time
Full-text index creation time depend(s) on (the) number of words to be indexed which is indirectly dependent on the length of word which you want to index. By default , all words with length >= 3 bytes get indexed in InnoDB. (MyISAM default word length is 4).

When you are comparing ‘create index’ time then just make sure MyISAM and InnoDB are indexing the same number of words from the data. InnoDB uses “InnoDB_ft_min_token_size” variable to set the minimum word length. It is also dependent on max word length to be indexed but it is generally not changed. Default stop-word list for InnoDB contain less word than that of MyISAM stop-word list so it will index more word(s).

Looking at InnoDB full-text search architecture details , it is clear that usage of (user defined “FTS_DOC_ID”) column and server parameter “InnoDB_ft_sort_pll_degree” (could affect performance)  help to increase the performance. Let us see what do these parameter mean so we will get an idea why they are used/tuned to get better performance.

1)
InnoDB full-text search requires a monotonically increasing “Doc ID” column for fulltext index. There is no requirement that user must add this column but InnoDB will add it automatically if the column is missing and rebuild of unique/primary index. From performance point of view , its good that user could pre-define this column saving rebuild of  index. As of now, it needs to be a 8 bytes unique non-null BIGINT column and it must have the reserved name “FTS_DOC_ID” (all upper case).

Example , User can  avoid a index rebuild by server , using “FTS_DOC_ID” column when creating the table
CREATE TABLE wp (
FTS_DOC_ID BIGINT unsigned NOT NULL AUTO_INCREMENT,
title varchar(255) NOT NULL DEFAULT ”,
text mediumtext NOT NULL,
PRIMARY KEY (`FTS_DOC_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on wp(FTS_DOC_ID);

Note: “FTS_DOC_ID” (reserved column name, all upper case) column and FTS_DOC_ID_INDEX (reserved index name) are added. These two are used by InnoDB internally, if either is missing, InnoDB will create them for you.

If you are not in a hurry, and don’t want to manage your own Doc ID, then you can just create table as following. Subsequent “create fulltext index” call will do a rebuild on the table and add FTS_DOC_ID column as hidden column, and FTS_DOC_ID_INDEX on FTS_DOC_ID.
CREATE TABLE wp (
title varchar(255) NOT NULL DEFAULT ”,
text mediumtext NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

2)
InnoDB full-text search has tokenization and sort phases which are handled using threads. “InnoDB_ft_sort_pll_degree” variable determine how many parallel threads to use for tokenization and sorting. It has been observed that ‘create index’ is faster with large values of InnoDB_ft_sort_pll_degree on multi-core machines.

Default value for “InnoDB_ft_sort_pll_degree” is 2 . This is a static variable so it required to be set at server start-up time and it will/can be moved as dynamic variable at later enhancement of InnoDB full-text search.
* Use higher value for InnoDB_ft_sort_pll_degree ( it takes values 2,4,8,16,..)

Following Numbers are obtained with setup as

Schema: Table with number of words (as defined above)
Index : (title, text) columns
Data Size : Approx 2.7GB, 1 million row , 238 million word
Platform: Linux x86 64 bit , 3Ghz , 8 cores , 32 GB RAM

‘Create Index’ Performance


Default 4 8 16
InnoDB 7 min 48.12 5 min 12.06 4 min 10.95 3 min 33.4
MyISAM 11min 42.57 11 min 32.27 11min 32.79 11 min 25.17

Note: “innodb_ft_sort_pll_degree” do not affect myisam , hence ‘create index’ time is almost same around 11 min.

One thing to note is that InnoDB full-text search create index use extensive merge sort. So on some system we see memory usage larger than expected, due to OS file system cache.  You can turn off such OS file system cache on Linux or Solaris by setting “innodb_disable_sort_file_cache” to TRUE, which enables direct I/O. However, doing such might hurt performance to the merge sort to some extent.

Its been observed that CPU utilization is in proportion to “innodb_ft_sort_pll_degree” values. ‘Create index’ in Innodb goes through phases called tokenization, merge sort and insert . CPU utilization is listed for each phase receptively along with Memory utilization for default(2) value of innodb_ft_sort_pll_degree. ( Note: CPU utilization values of different phases are snapshot values of that phase).

Variable












core 0 core 1 core 2 core 3 core 4 core 5 core 6 core 7
Default(2) CPU (%)










Tokenization 0 0 0 0 96.97 0 0 97.11


Merge Sort 20.23 0 0 1 0 46.56 0 1


Insert Data 26 24.14 36.14 28.14 37 36.68 39.7 31

Memory (%) 5.8%  across all core







(Variable = innodb_ft_sort_pll_degree )

Above scenario when repeated with utf8 character-set on table also showed improvement in ‘create index’ time with increased values of  innodb_ft_sort_pll_degree. Insertion of 100 records on top of 1M record was faster for Innodb than MyISAM. InnoDB inserted 100 records in 10 millisec where as MyISAM took 30 millisec.

b) Query Execution Time With Full-text Index
In this section, we briefly go over the execution time of some typical full-text search queries:

Queries used are :
1) select count(*) from wp where match(title,text) against (‘Creationist +Abrahamic -creationism’ in boolean mode);
2) select count(*) from wp where match(title,text) against (‘Abrahamic’);
3) select count(*) from wp where match(title,text) against (‘preassumtions +orangutan’ in boolean mode);
4) select count(*) from wp where match(title,text) against (‘orangutan +falsified ~naturalistically’ in boolean mode);
5) select count(*) from wp where match(title,text) against (‘+india* -leader +gandh*’ in boolean mode);
6) select count(*) from wp where match(title,text) against (‘”american culture”@09′ in boolean mode);

Query InnoDB (ms) * InnoDB FTS Search Processing Time MyISAM (ms)
1 10 2 10
2 130 1 10
3 1 1 10
4 2 2 10
5 320 116 1000
6 (New InnoDB Feature) 5250
NA


* FTS Search processing time is the actual time spend by InnoDB full-text index scan.

There are still activities going to to synchronize the MySQL Optimizer and QP to the InnoDB FTS, so some queries types are not yet fully tuned. So there are further rooms to improve for various queries.


Summary:

We see ‘create index’ performance scale well align with the number of parallel degree and overall comparable performance with MyISAM in terms of query execution time.

Sunday Sep 19, 2010

MySQL 5.5: InnoDB Performance Improvements on Windows

Note: this article was originally published on http://blogs.innodb.com on Sept 19, 2010 by Calvin Sun.

At MySQL, we know our users want Performance, Scalability, Reliability, and Availability, regardless of the platform the choose to deploy. We have always had excellent benchmarks on Linux, and with MySQL 5.5, we are also working hard on improving performance on Windows.

The original patch of improving Windows performance was developed by MySQL senior developer Vladislav Vaintroub; benchmarks by QA engineer Jonathan Miller. We integrated the patch into MySQL 5.5 release.

The following two charts show the comparison of MySQL 5.5 vs. MySQL 5.1 (plugin) vs. MySQL 5.1 (builtin) using sysbench:

MySQL 5.5 includes a Windows performance patch, plus other performance improvements. This Windows performance patch has two part: part one is to fix the inefficiency of InnoDB slow mutex implementation on Windows – implement slow mutex as CriticalSection; and part two is to take advantage of condition variables on Windows Vista or newer Windows operating systems. Condition variables are not supported on Windows Server 2003 and Windows XP/2000.

What if I have an old Windows on which condition variables are not supported? Well, you can still benefit from the new slow mutex implementation; but will not be able to take the advantage of condition variables. The same binary runs on old Windows too. The InnoDB does a dynamic checking during start-up, to see whether condition variables are supported by the operating system it runs on.

Another benefit from this patch is the reduced consumption of Windows kernel objects (or handles), which deserves another blog on its own.

Download MySQL 5.5 RC release to give it a try for yourself! Thanks to Vladislav and Jonathan for their hard work!

Hardware configuration and operating system

Intel x86_64, 4 CPU x 2 Cores , 3.166 GHz, 8GB RAM, and Windows Server 2008.

MySQL / InnoDB configuration

Relevant my.cnf parameters:

max_connections=500
max_connect_errors=50
table_cache=2048
query-cache-size=0
query-cache-type=0
disable-log-bin
transaction_isolation=REPEATABLE-READ
innodb-file-per-table=1
innodb_data_file_path=ibdata1:200M:autoextend
innodb_buffer_pool_size=4G
innodb_additional_mem_pool_size=20M
innodb_log_file_size=650M
innodb_log_files_in_group=2
innodb_log_buffer_size=16M
innodb_support_xa=0
innodb_doublewrite=1
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=1

Monday May 17, 2010

InnoDB recovery gets even faster in Plugin 1.1, thanks to native AIO

Note: this article was originally published on http://blogs.innodb.com on April 16, 2010 by Michael Izioumtchenko.

InnoDB Plugin 1.1 doesn’t add any recovery specific improvements on top of what we already have in Plugin 1.0.7. The details on the latter are available in this blog. Yet, when I tried to recover another big recovery dataset I created, I got the following results for total recovery time:

  • Plugin 1.0.7: 46min 21s
  • Plugin 1.1: 32min 41s

Plugin 1.1 recovery is 1.5 times faster. Why would that happen? The numerous concurrency improvements in Plugin 1.1 and MySQL 5.5 can’t really affect the recovery. The honor goes to Native Asynchronous IO on Linux. Let’s try without it:

  • Plugin 1.1 with –innodb-use-native-aio=0: 49min 07s

which is about the same as 1.0.7 time. My numerous other recovery runs showed that the random fluctuations account for 2-3min of a 30-45min test.

Why is native AIO good for you? Why is it better the  than the simulated AIO we already have? Here’s what Inaam Rana, our IO expert and the author of the AIO patch, says:

  • During recovery typically redo log application is performed by the IO helper threads in the completion routine.
  • With simulated aio the thread waits for IO to complete and then calls the completion routine.
  • With native aio the thread doesn’t have to wait for io to complete, instead it picks a completed request and applies redo to it.

Read more about native AIO here.

You don’t have to do anything to take advantage of this feature. It is enabled by default and is used where available as determined by configure.

Here are some details about the test environment:

Hardware: HP DL480, 32G RAM, 2×4 core Intel(R) Xeon(R) CPU E5450  @ 3.00GHz, RAID5, about 1T total storage

Dataset:  1757549 dirty pages, 2808364565 bytes of redo. For the curious, it was a sysbench table with 400 million rows and the workload I used was random row update by a simple perl script.  Note that this is over 28G worth of dirty pages which means I had to use a very abusive settings of innodb_buffer_pool=29G and innodb_max_dirty_pages_pct=99, given only 32G of RAM. The recovery was done using the same settings and in the first few attempts the recovery would fail because of what was eventually diagnosed as bug 53122. As it happens, InnoDB recovery uses some memory outside of the buffer pool and it wanted more of it that was really necessary.

InnoDB configuration parameters:

–innodb-buffer-pool-size=28g
–innodb-log-file-size=2047m
–innodb-adaptive-flushing=0
–innodb-io-capacity=100
–innodb-additional-mem-pool-size=16m
–innodb-log-buffer-size=16m
–innodb-adaptive-hash-index=0
–innodb-flush-log-at-trx-commit=2
–innodb-max-dirty-pages-pct=99

This is highly artificial setup that targets maximizing the generation of dirty pages and redo, and using as much memory as possible for those dirty pages.

Tuesday Apr 13, 2010

InnoDB recovery is now faster…much faster!

Note: this article was originally published on http://blogs.innodb.com on April 13, 2010 by Inaam Rana.

One of the well known and much written about complaint regarding InnoDB recovery is that it does not scale well on high-end systems. Well, not any more. In InnoDB plugin 1.0.7 (which is GA) and plugin 1.1 (which is part of MySQL 5.5.4) this issue has been addressed. Two major improvements, apart from some other minor tweaks, have been made to the recovery code. In this post I’ll explain these issues and the our solution for these.

First issue reported here is about available memory check eating up too much CPU. During recovery, the first phase, called redo scan phase, is where we read the redo logs from the disk and store them in a hash table. In the second phase, the redo application phase, these redo log entries are applied to the data pages. The hash table that stores the redo log entries grows in the buffer pool i.e.: memory for the entries is allocated in 16K blocks from the buffer pool. We have to ensure that the hash table does not end up allocating all the memory in the buffer pool leaving us with no room to read in pages during the redo log application phase. For this we have to keep checking the size of the heap that we are using for allocating the memory for the hash table entries. So why would it kill the performance? Because we do not have the total size of the heap available to us. We calculate it by traversing the list of blocks so far allocated. Imagine if we have gigabytes or redo log to apply (it can be up to 4G). That would mean hundreds of thousands of blocks in the heap! And we have to make a check roughly whenever we are reading in a new redo page during scan. An O(n * m) algorithm where ‘n’ is number of blocks in the heap and ‘m’ is number of redo pages that have to be scanned.

What is the solution we came up with? Store the total size of a heap in its header. Simple and effective. Our algorithm now becomes O(m).

Lets talk about the second issue reported here. During the redo log application phase, data pages to which redo log entries are applied are to be inserted in a list called flush_list which is ordered by the LSN of the earliest modification to a page i.e.: oldest_modification. During the normal working the LSN increases monotonically therefore the insertion to the flush_list always happens at the head. But during recovery we have to linearly search the flush_list to find the appropriate spot for insertion. The length to which the flush_list can grow is the number of modified pages (called dirty pages in db parlance) we had at the time of crash. On high-end system with multi-gigabyte buffer pools this number can be very high i.e.: million or more dirty pages. A linear search for insertion won’t scale.

There has been a talk in the community about how to fix this and various solutions have been suggested and some were implemented by the community as well. What we, at InnoDB, have finally implemented is to have an auxiliary data structure (a red-black tree in this case) which is active only during the recovery phase and which is used to speed up sorted insertions in the the flush_list. The flush list remains a list and after the recovery is over the red black is tree is discarded as during the normal operations we only ever append to the flush_list.

So much for the theory. Now let us see if we can walk the talk. To evaluate the effectiveness of this fix we’d need a crash when there are a lot of redo logs to apply and there are a lot of dirty pages. I requested Michael Izioumtchenko (InnoDB’s QA supremo) to come up with something. And he did the following:
The dataset was obtained by running a 60m sysbench readwrite uniform distribution in memory workload with prewarmed cache using the following configuration parameters:
–innodb-buffer-pool-size=18g
–innodb-log-file-size=2047m
–innodb-adaptive-flushing=0
–innodb-io-capacity=100

The latter two are used to throttle flushing in order to maximize the number of dirty pages.

It took only about 20 min of running a workload to arrive to the test dataset, including cache prewarming.
So at time of crash we had:
Modified db pages  1007907
Redo bytes: 3050455773
And the recovery times were:

Plugin 1.0.7 (also Plugin 1.1): 1m52s scan, 12m04s apply, total 13m56s
Plugin 1.0.6: 31m39s scan, 7h06m21s apply, total 7h38m
1.0.7 (and Plugin 1.1) is better 16.95x on scan, 35.33x on apply, 32.87x overall

Note that all this comes to you transparently. You don’t have to set any parameter to take advantage of this feature. My only suggestion would be to use as large log files (there is a limit of 4G on total log file size) as you can. I know users have been using smaller log files to avoid recovery running into hours. They have taken a hit on throughput during normal running to avoid longer recovery time. You don’t have to do that any more. InnoDB recovery will never run into hours. It’s a guarantee!

Better Scalability with Multiple Rollback Segments

Note: this article was originally published on http://blogs.innodb.com on April 13, 2010 by Sunny Bains.

Background

The original motivation behind this patch was the infamous Bug#26590MySQL does not allow more than 1023 open transactions. Actually the 1024 limit has to do with the number of concurrent update transactions that can run within InnoDB. Where does this magic number come from ? 1024 is the total number of UNDO log list slots on one rollback segment header page. And in the past InnoDB created just one rollback segment header page during database creation. This rollback segment header page is anchored in the system header page, there is space there for 128 rollback segments but only one was being created and used resulting in the 1024 limit. Each slot in the rollback segment header array comprises of {space_id, page_no}, where both space_id and page_no are of type uint32_t . Currently the space id is “unused” and always points to the system table space, which is tablespace 0. Now, onto the rollback segment header page. This page contains a rollback segment header (details of which are outside the scope of this blog entry :-) ), followed by an array of 1024 UNDO slots. Each slot is the base node of a file based linked list of UNDO logs. Each node in this file based list contains UNDO log records, containing the data updated by a transaction. A single UNDO log node can contain UNDO entries from several different transactions.

Performance ramifications

When a transaction is started it is allocated a rollback segment to write its modifications. Multiple transactions can write to the same rollback segment but only one transaction is allowed to write to any one UNDO slot during its lifetime. This should make clear where the 1024 limit comes from. Each rollback segment is protected by its own mutex and when we have a single rollback segment this rollback segment mutex can become a high contention mutex.

Requirements

Backward compatibility in file formats is something we take very seriously at InnoDB.  InnoDB has always had the ability to use up to 128 pages but before this fix it created only one rollback segment. We had to figure out a way to make the multiple rollback segments change backward compatible, without breaking any assumptions in the code of older versions of InnoDB about absolute locations of system pages and  changes to system data. The 128 limit is a result of the latter. While there is space for 256 rollback segments, InnoDB uses only 7 bits from that field. Once we fix that we could in the future enable 256 rollback segments, however 128 seems to be sufficient for now. There are other scalability issues that need to be addressed first before 128K concurrent transactions will become an issue :-) .

The solution

To keep backward compatibility the additional rollback segments are created after the double write buffer when creating the rollback segments in a new instance. We keep the total number of rollback segments set to 128 rather than 256 and set the remaining slots to NULL because the older versions will try and scan up to 256. This means that older versions of InnoDB could also benefit from these extra rollback segments.  If you decide to create the extra rollback segments with a newer version but decide to revert back to an older version the older versions should be able to use the additional rollback segments. Newer versions of InnoDB that contain this fix, will create the additional segments in existing instances only if the innodb_force_recovery flag is not set and the database instance was shutdown cleanly. The additional segments will be created by default when creating new instances.

Tuesday Aug 11, 2009

New InnoDB Plugin with MORE Performance: Thanks, Community!

Note: this article was originally published on http://blogs.innodb.com on August 11, 2009 by Calvin Sun.

Today, the InnoDB team announced the latest release of the InnoDB Plugin, release 1.0.4. Some of the performance gains in this release are quite remarkable!

As noted in the announcement, this release contains contributions from Sun Microsystems, Google and Percona, Inc., for which we are very appreciative. The purpose of this post is to describe the general approach the InnoDB team takes toward third party contributions.

In principle, we appreciate third party contributions. However, we simply don’t have the resources to seriously evaluate every change that someone proposes, but when we do undertake to evaluate a patch, we have some clear criteria in mind:

  • The patch has to be technically sound, reliable, and effective
  • The change should fit with the architecture, and our overall plans and philosophy for InnoDB
  • The contribution must be available to us under a suitable license

Let’s consider, in general terms, what these criteria mean in practice.

We have to expend a fair bit of effort to carefully evaluate and possibly modify a patch before we can include it in the release. Some of the third party contributions we’ve seen have not been portable, or have been developed just for Linux. It can take time to find an approach that enables a platform to take advantage of a new feature, even if the platform has the required capabililties. Some of the patches we’ve evaluated have contained actual bugs that would impact reliability, cause deadlocks or have other negative implications. InnoDB is a clean and elegant piece of code, yet some of its internal algorithms and behaviors are subtle and complex. Therefore, changes in the “guts” of InnoDB (or any storage engine) must be done carefully and thoroughly tested. Some patches that have been offered make a difference, but only when compared to an inappropriate “baseline”. At any given point, we would look to include a patch only if it makes a significant improvement over the “best” version or configuration of InnoDB available at the time. We like to test each patch in isolation, to assess its individual value. This requires some rigorous performance testing, with multiple workloads.

From time to time, third parties have made suggestions for changes that may seem attractive at first, but don’t make sense longer term. In general, we may have a more comprehensive approach to a problem or requirement that we would like to implement, rather than incorporate a patch that would introduce a feature that would ultimately be made obsolete. We prefer to have fewer “knobs” and tuning complexity, so we’re more inclined to implement heuristic, self-tuning capabilities than we are to add new configuration parameters. Lastly, we take care to protect the ability to upgrade and downgrade user databases with the file format management features in the InnoDB Plugin. If a patch requires an on-disk change, we will defer its incorporation until the time comes to implement a new file format.

For us to be able to make continued investment in InnoDB, we must be able to license the software commercially. OEMs and ISVs who incorporate MySQL with InnoDB in their products may not wish to release their products in open source form. Therefore, for each contribution we are to accept, we must have clear legal rights to the change.

Beyond all that, of course, we take care to carefully document each new feature, both in terms of form and function. We try hard to explain the implications of a feature, providing information about what it does, and when and where to use a feature, as well as how to do so. And, we generally speaking are committed to upward compatibility and support of a feature once it is introduced.

It’s pretty clear that the integrity of InnoDB, with its broad adoption and importance everywhere it is used, is paramount to you and to us. You can trust the InnoDB team to protect InnoDB now and in the future, while being open to suggestions and contributions. Let us know if you think we’re doing a good job!

About

This is the InnoDB team blog.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today