Wednesday Feb 20, 2008

Sun Studio Compiler Options for MySQL on Solaris 10 x64 OS : Performance Study


  • Introduction
  • Activity
  • Setup and build environment
  • MySQL Configuration options
  • Studio Compiler flags
  • Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test
  • Recommended compiler options for integration with webstack
  • Studio 12 64-bit (1-64 threads)- sysbench and iGen+sysbench read-only oltp test
  • Software documentation links

Introduction

Solaris 10, Sun's flagship OS is multi-platform, scalable and yields massive performance advantages for databases, Web, and Java technology-based services. Its advanced features include security (Process Rights Management), system observability (DTrace), system resource utilization (containers and virtualization), an optimized network stack, data management, system availability (Predictive Self Healing), interoperability tools, Support & Services (s/w subscription, h/w support, technical help).

Sun Studio compiler delivers high-performance, optimizing C, C++, and Fortran compilers for the Solaris OS on SPARC, and for both Solaris and Linux on x86/x64 platforms, including the latest multi-core systems.

Sun Fire™ x64 servers yield very high performance, have dual-core AMD Opteron processors and deliver eight-way performance in a four-processor system. Features include near linear CPU scalability, enterprise reliability, high rack density, redundant power and cooling, RAID storage, and remote system monitoring among others. Coupled with Solaris 10, these servers are capable of delivering very high levels of throughput for demanding departmental and enterprise applications.

MySQL, the most popular Open Source database was developed, distributed, and supported by a commercial company MySQL AB, now part of Sun as a result of an acquisition. MySQL is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Java client programs that use JDBC connections can access a MySQL server via the MySQL Connector/J interface.

The sysbench workload kit is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.

The iGen kit is an internally developed benchmark. It stresses commit operations and concurrency. Its core metric is transactions per minute (tpm) and average response time. It has a light SQL load . Most runs are memory "cached" executions. The log device is the I/O component that gets stressed.

 

Activity

The objective was to recommend a set of high performance Studio compiler flags for 32-bit integration with project webstack. webstack addresses the OpenSolaris community needs for web tier technologies. It is a bundle of open source software delivered in Solaris and supported by Sun, and contains software that Sun considers critical to its business.

The MySQL source code was compiled on a Sun Fire™ x64 system with sets of Sun Studio run time flags.The resulting binary for each set was then run against the sysbench workload to obtain the performance throughput.

The recommended flags were integrated into webstack with with appropriate MySQL configuration options.

 

Setup and Build environment

The OS Update version used is Solaris 10, Update 4 (s10x_u4wos_11) The C and C++ Compilers are part of the Studio compiler collection. The MySQL Community Server version used is 5.0.4x . The sysbench kit version used is v0.3.3 .

The MySQL Server and the sysbench kit are installed on a Sun Fire™ x64 server.

  • Database Node :
    • CPU : 4 core Opteron x 2593 MHz
    • Memory : 16,320 MB
    • Operating System : Solaris 10, Update 4

 

MySQL Configuration options :

Option Possible reason for inclusion
--prefix Specify installation dir.
--xxdir Specify a directory for serving a purpose
--with-server-suffix Adds a suffix to the mysqld version string
--enable-thread-safe-client Make mysql_real_connect() thread-safe with this option, and recompile the distribution to create a thread-safe client library, libmysqlclient_r
--with-mysqld-libs Include libs in mysqld
--with-named-curses=-lcurses Use specified curses libraries instead of those automatically found by configure
--with-client-ldflags=-static compile statically linked programs
--with-mysql-ldflags=-static compile statically linked programs
--with-pic try to use only PIC objects, and omit usage of non-PIC objects
--with-big-tables Support tables with more than 4 GB rows even on 32 bit platforms
--with-yassl To use SSL connections; configure to use the bundled yaSSL library
--with-readline Do not use system readline or bundled copy
--with-xx-storage-engine Enable the xx Storage Engine
--with-innodb Include the InnoDB table handler
--with-extra-charsets=complex Additionally include all character sets that can't be dynamically loaded to be compiled into the server
--enable-local-infile Permits usage of LOAD DATA (LOCAL INFILE) with files on client-side file system. This adds flexibility. With LOCAL, no access to the server is needed except for the MySQL connection
--with-ndb-cluster Enables support for the ndb cluster storage engine on applicable platforms
--with-zlib-dir=bundled Helps the linker find -lz (libz.so) when linking client programs

 

Studio Compiler flags :

Compiler Options Possible reason for inclusion
-m64 or -m32 Specifies the memory model for the compiled binary object, and generates optimal code.
-mt Macro option that expands to -D_REENTRANT -lthread
-fsimple=1 The optimizer is not allowed to optimize completely without regard to roundoff or exceptions. A floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime. Include this explicitly in the C++ flags
-fns=no Selects SSE flush-to-zero mode and, where available, denormals-are-zero mode; causes subnormal results to be flushed to zero; where available, causes subnormal operands to be treated as zero
-xbuiltin=%all Improves the optimization of code that calls for standard library functions
-xO3 Generates a high level of optimization.
-xstrconst Inserts string literals into the read-only data section of the text segment
-xlibmil Selects the appropriate assembly language inline templates for the floating-point option and platform
-xlibmopt Enables the compiler to use a library of optimized math routines.
-xtarget=generic Specifies the target system for instruction set and optimization. It sets -xarch, -xchip and -xcache
-xrestrict Tells the compiler that there is no pointer aliasing between the arguments in functions
-xprefetch=auto Enables prefetch instructions
-xprefetch_level=3 Controls the aggressiveness of automatic insertion of prefetch instructions as set by -xprefetch=auto
-xunroll=2 Suggests to the optimizer to uroll loops n times. Instructions called in multiple iterations are combined into a single iteration. Register usage and code size may increase.
-xalias_level Provides information to the compiler about pointer usage, and enables it to perform type-based alias analysis and optimizations.

 

Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test; bottom cell numbers correspond to tps throughput when run with SUNPRO_C source code change

Compiler Options 1 2 4 8
1. Release binary (Studio 10) : -xO3 -mt -fsimple=1 -ftrap=%none -nofstore -xbuiltin=%all -xlibmil -xlibmopt -xtarget=generic [for C++ , append -features=no%except] 416.67 722.88 1337.65 1315.38
2. Studio 11 Baseline : -xlibmil -xO3 -DHAVE_RWLOCK_T -mt -fsimple=1 -fns=no 393.95 444.90 674.42 747.58 1143.25 1297.07 1243.08 1284.92
3. -xbuiltin=%all 400.09 443.71 692.67 758.34 1170.63 1270.33 1219.10 1333.03
4. -xbuiltin=%all -xunroll=2 400.76 440.06 682.87 751.58 1312.33 1352.47 1233.43 1395.44
5. -xbuiltin=%all -xprefetch=auto -xprefetch_level=3 403.03 437.99 693.91 739.96 1324.03 1189.17 1246.68 1360.16
6. -xbuiltin=%all -xalias_level=std [=simple for C++] 394.16 435.58 684.62 737.08 1125.59 1408.82 1194.02 1403.18
7. -xbuiltin=%all -xtarget=native 400.84 443.89 685.57 755.54 1151.23 1234.93 1209.17 1303.74
8. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 400.39 443.86 694.79 747.41 1131.03 1236.80 1261.71 1223.67
9. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 -xalias_level 397.44 442.95 691.14 753.21 1119.71 1287.13 1222.34 1281.75

 

Recommended compiler options for integration with webstack

A.) The recommended Studio 11 compiler flags for webstack on AMD64 are ' -xbuiltin=%all' and ' -xprefetch=auto -xprefetch_level=3 ' . A throughput increase of 2.3%-3% was observed over the baseline.

B.) The SUNPRO_C source change yielded a consistent throughput increase in most cases (7%-10%) over those run without this change. This option can be used for SPARC and x64 platforms. The original MySQL sources have explicit inlining of small support functions only with gcc and Visual C++. However, this inlining is found to help Sun Studio as well, and can be enabled with the following change to the header file $MYSQL_HOME/innobase/include/univ.i on line 61:

#if !defined(GNUC) && !defined(WIN) && !defined(__SUNPRO_C)

 

Studio 12 64-bit (1-64 threads)- sysbench read-only oltp test; Top tps number in a cell was obtained when using a patched compiler; the middle tps numbers correspond to runs with an unpatched compiler; and the bottom tps numbers were obtained with binaries run previously against the iGen workload

Compiler Options  1  2
  4
  8
  16  32
  64
1. Release binary (64-bit) : -m64 -O2 -mtune=k8 [LDFLAGS=-static-libgcc]548.00
900.64698.981588.821693.181587.621583.23
2. Studio 12 (64-bit): -fast -m64 -DHAVE_RWLOCK_T -mt [for C++, append -fsimple=1 -fns=no]479.23 484.43 478.86803.69 800.71 804.571423.95 1487.72 1246.741449.46 1457.24 1380.511398.36 1367.45 1455.211347.89 1324.32 1333.261342.14 1333.89 1324.67
3. Feedback Optimization added (FBO): -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no]522.23 527.12 525.32861.59 862.50 862.141381.42 1385.52 1592.071436.56 1488.54 1595.251449.39 1489.28 1570.371351.54 1490.72 1406.151399.92 1427.93 1444.99
4. FBO + Loop Unrolling : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir -[for C++, append -fsimple=1 -fns=no -xunroll=2519.83 521.27 527.47862.08 857.37 867.341489.89 1429.39 1605.271594.95 1515.42 1597.161470.04 1393.64 1572.661498.09 1466.43 1429.441435.28 1434.98 1438.04
5. FBO + Prefetching : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no] -xprefetch=auto -xprefetch_level=3515.91 523.84 526.51866.70 860.84 868.961484.00 1376.04 1342.131503.10 1476.83 1529.011510.76 1503.53 1575.711405.84 1388.53 1485.511434.42 1433.54 1452.49
6. FBO + Pointer aliasing (xalias=strong) : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no] -xalias_level=strong [=simple for C++]

524.58 521.35 533.86

873.51 866.01 873.86

1354.31 1351.95 1623.731457.34 1561.47 1631.011538.97 1502.79 1558.291521.22 1481.15 1479.871451.58 1465.48 1448.59
7. FBO + Restricted Pointer Parameters : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no] -xrestrict=%all
526.07 526.80 528.67865.49 861.65 868.881329.60 1343.57 1359.181592.37 1317.78 1442.331522.06 1409.31 1456.711521.06 1410.47 1512.981433.70 1411.82 1400.65

A.) The runs with the patched compiler yielded a marginal throughput increase in about two thirds of the cases over those with an unpatched compiler.

B.) Patching involved compiler patch 6538437, and a change to the MySQL header file $MYSQL_HOME/innobase/include/univ.i on line 61, to include '__sparc' . The code snippet below is used to determine whether functions get declared as "static inline" or not. Currently triggered for gcc and windows, the SPARC compiler will also support this syntax :

#if !defined(GNUC) && !defined(WIN) && !defined(__sparc)

C.) The runs with the binaries previously run with iGen workload and currently run with sysbench yielded an appreciable throughput increase (4% - 7.7%) in about one fourths of the cases over those binaries run with only sysbench.

 

Software documentation links

  • Solaris 10 OS : (here)
  • Sun Studio 12 Compiler Collection : (here)
  • Sun Fire™ SPARC and x64 servers : (here)
  • MySQL Database : (here)
  • sysbench site : (here)

Sun Studio Compiler Options for MySQL on Solaris 10 SPARC OS : Performance Study


  • Introduction
  • Activity
  • Setup and build environment
  • MySQL Configuration options
  • Studio Compiler flags
  • Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test
  • Recommended compiler options
  • Studio 12 64-bit (1-8 threads)- sysbench read-only oltp test
  • Software documentation links
  • References
  • Acknowledgements

 

Introduction

Solaris 10, Sun's flagship OS is multi-platform, scalable and yields massive performance advantages for databases, Web, and Java technology-based services. Its advanced features include security (Process Rights Management), system observability (DTrace), system resource utilization (containers and virtualization), an optimized network stack, data management, system availability (Predictive Self Healing), interoperability tools, Support & Services (s/w subscription, h/w support, technical help).

Sun Studio compiler delivers high-performance, optimizing C, C++, and Fortran compilers for the Solaris OS on SPARC, and for both Solaris and Linux on x86/x64 platforms, including the latest multi-core systems.

Sun Fire™ SPARC servers pack up to 4 UltraSPARC IV Chip Multi threading processors delivering up to eight concurrent threads in 32 GB of memory. Coupled with Solaris 10, these servers are capable of delivering very high levels of throughput for demanding departmental and enterprise applications.

MySQL, the most popular Open Source database was developed, distributed, and supported by a commercial company MySQL AB, now part of Sun as a result of an acquisition. MySQL is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Java client programs that use JDBC connections can access a MySQL server via the MySQL Connector/J interface.

The sysbench workload kit is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.

 

Activity

The objective was to recommend a set of high performance Studio compiler flags for 32-bit integration with project webstack. webstack addresses the Open Solaris community needs for web tier technologies. It is a bundle of open source software delivered in Solaris and supported by Sun, and contains software that Sun considers critical to its business.

The MySQL source code was compiled on a Sun Fire™ SPARC system with sets of Sun Studio run time flags.The resulting binary for each set was then run against the sysbench workload to obtain the performance throughput.

The recommended flags were integrated into webstack with appropriate MySQL configuration options.

 

Setup and Build environment

The OS Update version used is Solaris 10, Update 4 (s10x_u4wos_11) The C and C++ Compilers are part of the Studio compiler collection. The MySQL Community Server version used is 5.0.4x . The sysbench kit version used is v0.3.3 .

The MySQL Server and the sysbench kit are installed on a Sun Fire™ SPARC server.

  • Database Node :
    • CPU : 4 core UltraSPARC-IV x 1350 MHz
    • Memory : 32,768 MB
    • Operating System : Solaris 10, Update 4
    •  

MySQL Configuration options :

Option Possible reason for inclusion
--prefix Specify installation dir.
--xxdir Specify a directory for serving a purpose
--with-server-suffix Adds a suffix to the mysqld version string
--enable-thread-safe-client Make mysql_real_connect() thread-safe with this option, and recompile the distribution to create a thread-safe client library, libmysqlclient_r
--with-mysqld-libs Include libs in mysqld
--with-named-curses=-lcurses Use specified curses libraries instead of those automatically found by configure
--with-client-ldflags=-static compile statically linked programs
--with-mysql-ldflags=-static compile statically linked programs
--with-pic try to use only PIC objects, and omit usage of non-PIC objects
--with-big-tables Support tables with more than 4 GB rows even on 32 bit platforms
--with-yassl To use SSL connections; configure to use the bundled yaSSL library
--with-readline Do not use system readline or bundled copy
--with-xx-storage-engine Enable the xx Storage Engine
--with-innodb Include the InnoDB table handler
--with-extra-charsets=complex Additionally include all character sets that can't be dynamically loaded to be compiled into the server
--enable-local-infile Permits usage of LOAD DATA (LOCAL INFILE) with files on client-side file system. This adds flexibility. With LOCAL, no access to the server is needed except for the MySQL connection
--with-ndb-cluster Enables support for the ndb cluster storage engine on applicable platforms
--with-zlib-dir=bundled Helps the linker find -lz (libz.so) when linking client programs

 

Studio Compiler flags :

Compiler Options Possible reason for inclusion
-m64 or -m32 Specifies the memory model for the compiled binary object, and generates optimal code.
-mt Macro option that expands to -D_REENTRANT -lthread
-fsimple=1 The optimizer is not allowed to optimize completely without regard to roundoff or exceptions. A floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime. Include this explicitly in the C++ flags
-fns=no Selects SSE flush-to-zero mode and, where available, denormals-are-zero mode; causes subnormal results to be flushed to zero; where available, causes subnormal operands to be treated as zero
-xbuiltin=%all Improves the optimization of code that calls for standard library functions
-xO3 Generates a high level of optimization.
-xstrconst Inserts string literals into the read-only data section of the text segment
-xlibmil Selects the appropriate assembly language inline templates for the floating-point option and platform
-xlibmopt Enables the compiler to use a library of optimized math routines.
-xtarget=generic Specifies the target system for instruction set and optimization. It sets -xarch, -xchip and -xcache
-xrestrict Tells the compiler that there is no pointer aliasing between the arguments in functions
-xprefetch=auto Enables prefetch instructions
-xprefetch_level=3 Controls the aggressiveness of automatic insertion of prefetch instructions as set by -xprefetch=auto
-xunroll=2 Suggests to the optimizer to uroll loops n times. Instructions called in multiple iterations are combined into a single iteration. Register usage and code size may increase.
-xalias_level Provides information to the compiler about pointer usage, and enables it to perform type-based alias analysis and optimizations.

 

Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test; bottom cell numbers correspond to tps throughput when run with with SUNPRO_C source code change

1 2 4 8 16
1. Release binary (Studio 10) : -xO3 -Xa -xstrconst -mt -D_FORTEC_ -xarch=v8 -xc99=none [for C++ , use -noex , and remove -Xa and -xstrconst] 181.35 335.26 594.94 914.80 942.06
2. Studio 11 Baseline : -xlibmil -xO3 -DHAVE_RWLOCK_T -mt -fsimple=1 -fns=no 183.28 184.22 337.34 338.78 598.87 597.71 833.04 804.49 929.94 871.35
3. -xbuiltin=%all 186.10 185.59 345.30 342.60 604.56 603.12 812.25 921.53 930.23 942.36
4. -xbuiltin=%all -xunroll=2 188.53 189.39 347.28 349.89 613.24 613.07 927.57 846.86 941.56 888.54
5. -xbuiltin=%all -xprefetch=auto -xprefetch_level=3 184.16 186.40 343.20 344.76 602.33 604.19 839.48 813.32 943.60 948.48
6. -xbuiltin=%all -xalias_level=std [=simple for C++] 186.81 187.77 347.56 346.61 610.44 611.10 817.28 923.46 946.79 922.31
7. -xbuiltin=%all -xtarget=native 190.70 190.84 354.24 353.79 619.54 620.02 828.70 849.72 948.16 898.97
8. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 188.21 188.44 348.44 348.51 614.24 613.37 840.94 926.02 948.76 942.59
9. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 -xalias_level 187.86 189.38 348.22 347.57 618.12 618.38 850.12 851.36 941.20 909.14

 

Recommended compiler options for integration with webstack

A.) The recommended Studio 11 compiler flags for webstack on SPARC are ' -xbuiltin=%all' and ' -xtarget=native, -xunroll=2'. With -xtarget, a throughput increase of 3.4%-4% was observed over the baseline. With -xunroll, a throughput increase of 2.3%-3.6% was observed over the baseline.

B.)The SUNPRO_C source change yielded a throughput increase in two-thirds of the cases over those run without this change. This option can be used for SPARC and x64 platforms. The original MySQL sources have explicit inlining of small support functions only with gcc and Visual C++. However, this inlining is found to help Sun Studio as well, and can be enabled with the following change to the header file $MYSQL_HOME/innobase/include/univ.i on line 61:

#if !defined(GNUC) && !defined(WIN) && !defined(__SUNPRO_C)

 

Studio 12 64-bit (1-64 threads)- sysbench read-only oltp test;

Compiler Options 1 2 4 8
1. Release binary (64-bit) : -m64 -O2 -mtune=k8 [LDFLAGS=-static-libgcc] 182.93 335.54 592.00 902.69
2. Studio 12 (64-bit): -Xa -fast -m64 -xarch=sparc -xstrconst -mt [for C++, append -noex -fsimple=1 -fns=no and remove -Xa] 197.81 346.33 586.13 685.93
3. Feedback Optimization added (FBO): As in 3 with -xprofile=use:dir ] 223.09 385.44 635.79 719.65
4. FBO + Loop Unrolling : As in 4 with -xunroll=2 227.25 388.14 631.89 724.14
5. FBO + Prefetching : As in 4 with -xprefetch=auto -xprefetch_level=3 228.64 395.34 651.52 723.02
6. FBO + Restricted Pointer Parameters : As in 4 with -xrestrict=%all 228.91 393.56 635.01 736.71

The Studio 12 compiler flags that performed the best were -xrestrict=%all and '-xprefetch=auto -xprefetch_level=3', when used with FBO. These combinations gave a throughput increase of 8% - 15% over the baseline studio64 (without FBO).

 

Software documentation links

  • Solaris 10 OS : (here)
  • Sun Studio 12 Compiler Collection : (here)
  • Sun Fire™ Servers : (here)
  • MySQL Database : (here)
  • sysbench site : (here)

About

Krish Shankar

Search

Categories
Archives
February 2008 »
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
 
       
Today