Sun Studio Compiler Options for MySQL on Solaris 10 x64 OS : Performance Study
By Krish Shankar on Feb 20, 2008
- Setup and build environment
- MySQL Configuration options
- Studio Compiler flags
- Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test
- Recommended compiler options for integration with webstack
- Studio 12 64-bit (1-64 threads)- sysbench and iGen+sysbench read-only oltp test
- Software documentation links
Solaris 10, Sun's flagship OS is multi-platform, scalable and yields massive performance advantages for databases, Web, and Java technology-based services. Its advanced features include security (Process Rights Management), system observability (DTrace), system resource utilization (containers and virtualization), an optimized network stack, data management, system availability (Predictive Self Healing), interoperability tools, Support & Services (s/w subscription, h/w support, technical help).
Sun Studio compiler delivers high-performance, optimizing C, C++, and Fortran compilers for the Solaris OS on SPARC, and for both Solaris and Linux on x86/x64 platforms, including the latest multi-core systems.
Sun Fire™ x64 servers yield very high performance, have dual-core AMD Opteron processors and deliver eight-way performance in a four-processor system. Features include near linear CPU scalability, enterprise reliability, high rack density, redundant power and cooling, RAID storage, and remote system monitoring among others. Coupled with Solaris 10, these servers are capable of delivering very high levels of throughput for demanding departmental and enterprise applications.
MySQL, the most popular Open Source database was developed, distributed, and supported by a commercial company MySQL AB, now part of Sun as a result of an acquisition. MySQL is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Java client programs that use JDBC connections can access a MySQL server via the MySQL Connector/J interface.
The sysbench workload kit is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.
The iGen kit is an internally developed benchmark. It stresses commit operations and concurrency. Its core metric is transactions per minute (tpm) and average response time. It has a light SQL load . Most runs are memory "cached" executions. The log device is the I/O component that gets stressed.
The objective was to recommend a set of high performance Studio compiler flags for 32-bit integration with project webstack. webstack addresses the OpenSolaris community needs for web tier technologies. It is a bundle of open source software delivered in Solaris and supported by Sun, and contains software that Sun considers critical to its business.
The MySQL source code was compiled on a Sun Fire™ x64 system with sets of Sun Studio run time flags.The resulting binary for each set was then run against the sysbench workload to obtain the performance throughput.
The recommended flags were integrated into webstack with with appropriate MySQL configuration options.
The OS Update version used is Solaris 10, Update 4 (s10x_u4wos_11) The C and C++ Compilers are part of the Studio compiler collection. The MySQL Community Server version used is 5.0.4x . The sysbench kit version used is v0.3.3 .
The MySQL Server and the sysbench kit are installed on a Sun Fire™ x64 server.
- Database Node :
- CPU : 4 core Opteron x 2593 MHz
- Memory : 16,320 MB
- Operating System : Solaris 10, Update 4
|Option||Possible reason for inclusion|
|--prefix||Specify installation dir.|
|--xxdir||Specify a directory for serving a purpose|
|--with-server-suffix||Adds a suffix to the mysqld version string|
|--enable-thread-safe-client||Make mysql_real_connect() thread-safe with this option, and recompile the distribution to create a thread-safe client library, libmysqlclient_r|
|--with-mysqld-libs||Include libs in mysqld|
|--with-named-curses=-lcurses||Use specified curses libraries instead of those automatically found by configure|
|--with-client-ldflags=-static||compile statically linked programs|
|--with-mysql-ldflags=-static||compile statically linked programs|
|--with-pic||try to use only PIC objects, and omit usage of non-PIC objects|
|--with-big-tables||Support tables with more than 4 GB rows even on 32 bit platforms|
|--with-yassl||To use SSL connections; configure to use the bundled yaSSL library|
|--with-readline||Do not use system readline or bundled copy|
|--with-xx-storage-engine||Enable the xx Storage Engine|
|--with-innodb||Include the InnoDB table handler|
|--with-extra-charsets=complex||Additionally include all character sets that can't be dynamically loaded to be compiled into the server|
|--enable-local-infile||Permits usage of LOAD DATA (LOCAL INFILE) with files on client-side file system. This adds flexibility. With LOCAL, no access to the server is needed except for the MySQL connection|
|--with-ndb-cluster||Enables support for the ndb cluster storage engine on applicable platforms|
|--with-zlib-dir=bundled||Helps the linker find -lz (libz.so) when linking client programs|
|Compiler Options||Possible reason for inclusion|
|-m64 or -m32||Specifies the memory model for the compiled binary object, and generates optimal code.|
|-mt||Macro option that expands to -D_REENTRANT -lthread|
|-fsimple=1||The optimizer is not allowed to optimize completely without regard to roundoff or exceptions. A floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime. Include this explicitly in the C++ flags|
|-fns=no||Selects SSE flush-to-zero mode and, where available, denormals-are-zero mode; causes subnormal results to be flushed to zero; where available, causes subnormal operands to be treated as zero|
|-xbuiltin=%all||Improves the optimization of code that calls for standard library functions|
|-xO3||Generates a high level of optimization.|
|-xstrconst||Inserts string literals into the read-only data section of the text segment|
|-xlibmil||Selects the appropriate assembly language inline templates for the floating-point option and platform|
|-xlibmopt||Enables the compiler to use a library of optimized math routines.|
|-xtarget=generic||Specifies the target system for instruction set and optimization. It sets -xarch, -xchip and -xcache|
|-xrestrict||Tells the compiler that there is no pointer aliasing between the arguments in functions|
|-xprefetch=auto||Enables prefetch instructions|
|-xprefetch_level=3||Controls the aggressiveness of automatic insertion of prefetch instructions as set by -xprefetch=auto|
|-xunroll=2||Suggests to the optimizer to uroll loops n times. Instructions called in multiple iterations are combined into a single iteration. Register usage and code size may increase.|
|-xalias_level||Provides information to the compiler about pointer usage, and enables it to perform type-based alias analysis and optimizations.|
|1. Release binary (Studio 10) : -xO3 -mt -fsimple=1 -ftrap=%none -nofstore -xbuiltin=%all -xlibmil -xlibmopt -xtarget=generic [for C++ , append -features=no%except]||416.67||722.88||1337.65||1315.38|
|2. Studio 11 Baseline : -xlibmil -xO3 -DHAVE_RWLOCK_T -mt -fsimple=1 -fns=no||393.95 444.90||674.42 747.58||1143.25 1297.07||1243.08 1284.92|
|3. -xbuiltin=%all||400.09 443.71||692.67 758.34||1170.63 1270.33||1219.10 1333.03|
|4. -xbuiltin=%all -xunroll=2||400.76 440.06||682.87 751.58||1312.33 1352.47||1233.43 1395.44|
|5. -xbuiltin=%all -xprefetch=auto -xprefetch_level=3||403.03 437.99||693.91 739.96||1324.03 1189.17||1246.68 1360.16|
|6. -xbuiltin=%all -xalias_level=std [=simple for C++]||394.16 435.58||684.62 737.08||1125.59 1408.82||1194.02 1403.18|
|7. -xbuiltin=%all -xtarget=native||400.84 443.89||685.57 755.54||1151.23 1234.93||1209.17 1303.74|
|8. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3||400.39 443.86||694.79 747.41||1131.03 1236.80||1261.71 1223.67|
|9. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 -xalias_level||397.44 442.95||691.14 753.21||1119.71 1287.13||1222.34 1281.75|
A.) The recommended Studio 11 compiler flags for webstack on AMD64 are ' -xbuiltin=%all' and ' -xprefetch=auto -xprefetch_level=3 ' . A throughput increase of 2.3%-3% was observed over the baseline.
B.) The SUNPRO_C source change yielded a consistent throughput increase in most cases (7%-10%) over those run without this change. This option can be used for SPARC and x64 platforms. The original MySQL sources have explicit inlining of small support functions only with gcc and Visual C++. However, this inlining is found to help Sun Studio as well, and can be enabled with the following change to the header file $MYSQL_HOME/innobase/include/univ.i on line 61:
#if !defined(GNUC) && !defined(WIN) && !defined(__SUNPRO_C)
Studio 12 64-bit (1-64 threads)- sysbench read-only oltp test; Top tps number in a cell was obtained when using a patched compiler; the middle tps numbers correspond to runs with an unpatched compiler; and the bottom tps numbers were obtained with binaries run previously against the iGen workload
|Compiler Options||1|| 2|| 4|| 8||16|| 32|| 64|
|1. Release binary (64-bit) : -m64 -O2 -mtune=k8 [LDFLAGS=-static-libgcc]||548.00||900.64||698.98||1588.82||1693.18||1587.62||1583.23|
|2. Studio 12 (64-bit): -fast -m64 -DHAVE_RWLOCK_T -mt [for C++, append -fsimple=1 -fns=no]||479.23 484.43 478.86||803.69 800.71 804.57||1423.95 1487.72 1246.74||1449.46 1457.24 1380.51||1398.36 1367.45 1455.21||1347.89 1324.32 1333.26||1342.14 1333.89 1324.67|
|3. Feedback Optimization added (FBO): -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no]||522.23 527.12 525.32||861.59 862.50 862.14||1381.42 1385.52 1592.07||1436.56 1488.54 1595.25||1449.39 1489.28 1570.37||1351.54 1490.72 1406.15||1399.92 1427.93 1444.99|
|4. FBO + Loop Unrolling : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir -[for C++, append -fsimple=1 -fns=no -xunroll=2||519.83 521.27 527.47||862.08 857.37 867.34||1489.89 1429.39 1605.27||1594.95 1515.42 1597.16||1470.04 1393.64 1572.66||1498.09 1466.43 1429.44||1435.28 1434.98 1438.04|
|5. FBO + Prefetching : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no] -xprefetch=auto -xprefetch_level=3||515.91 523.84 526.51||866.70 860.84 868.96||1484.00 1376.04 1342.13||1503.10 1476.83 1529.01||1510.76 1503.53 1575.71||1405.84 1388.53 1485.51||1434.42 1433.54 1452.49|
|6. FBO + Pointer aliasing (xalias=strong) : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir [for C++, append -fsimple=1 -fns=no] -xalias_level=strong [=simple for C++]|
524.58 521.35 533.86
873.51 866.01 873.86
|1354.31 1351.95 1623.73||1457.34 1561.47 1631.01||1538.97 1502.79 1558.29||1521.22 1481.15 1479.87||1451.58 1465.48 1448.59|
| 7. FBO + Restricted
Pointer Parameters : -fast -m64 -DHAVE_RWLOCK_T -mt -xprofile=use:dir
[for C++, append -fsimple=1 -fns=no] -xrestrict=%all ||526.07 526.80 528.67||865.49 861.65 868.88||1329.60 1343.57 1359.18||1592.37 1317.78 1442.33||1522.06 1409.31 1456.71||1521.06 1410.47 1512.98||1433.70 1411.82 1400.65|
A.) The runs with the patched compiler yielded a marginal throughput increase in about two thirds of the cases over those with an unpatched compiler.
B.) Patching involved compiler patch 6538437, and a change to the MySQL header file $MYSQL_HOME/innobase/include/univ.i on line 61, to include '__sparc' . The code snippet below is used to determine whether functions get declared as "static inline" or not. Currently triggered for gcc and windows, the SPARC compiler will also support this syntax :
#if !defined(GNUC) && !defined(WIN) && !defined(__sparc)
C.) The runs with the binaries previously run with iGen workload and currently run with sysbench yielded an appreciable throughput increase (4% - 7.7%) in about one fourths of the cases over those binaries run with only sysbench.
- Solaris 10 OS : (here)
- Sun Studio 12 Compiler Collection : (here)
- Sun Fire™ SPARC and x64 servers : (here)
- MySQL Database : (here)
- sysbench site : (here)