Sun Studio Compiler Options for MySQL on Solaris 10 SPARC OS : Performance Study


  • Introduction
  • Activity
  • Setup and build environment
  • MySQL Configuration options
  • Studio Compiler flags
  • Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test
  • Recommended compiler options
  • Studio 12 64-bit (1-8 threads)- sysbench read-only oltp test
  • Software documentation links
  • References
  • Acknowledgements

 

Introduction

Solaris 10, Sun's flagship OS is multi-platform, scalable and yields massive performance advantages for databases, Web, and Java technology-based services. Its advanced features include security (Process Rights Management), system observability (DTrace), system resource utilization (containers and virtualization), an optimized network stack, data management, system availability (Predictive Self Healing), interoperability tools, Support & Services (s/w subscription, h/w support, technical help).

Sun Studio compiler delivers high-performance, optimizing C, C++, and Fortran compilers for the Solaris OS on SPARC, and for both Solaris and Linux on x86/x64 platforms, including the latest multi-core systems.

Sun Fire™ SPARC servers pack up to 4 UltraSPARC IV Chip Multi threading processors delivering up to eight concurrent threads in 32 GB of memory. Coupled with Solaris 10, these servers are capable of delivering very high levels of throughput for demanding departmental and enterprise applications.

MySQL, the most popular Open Source database was developed, distributed, and supported by a commercial company MySQL AB, now part of Sun as a result of an acquisition. MySQL is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Java client programs that use JDBC connections can access a MySQL server via the MySQL Connector/J interface.

The sysbench workload kit is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.

 

Activity

The objective was to recommend a set of high performance Studio compiler flags for 32-bit integration with project webstack. webstack addresses the Open Solaris community needs for web tier technologies. It is a bundle of open source software delivered in Solaris and supported by Sun, and contains software that Sun considers critical to its business.

The MySQL source code was compiled on a Sun Fire™ SPARC system with sets of Sun Studio run time flags.The resulting binary for each set was then run against the sysbench workload to obtain the performance throughput.

The recommended flags were integrated into webstack with appropriate MySQL configuration options.

 

Setup and Build environment

The OS Update version used is Solaris 10, Update 4 (s10x_u4wos_11) The C and C++ Compilers are part of the Studio compiler collection. The MySQL Community Server version used is 5.0.4x . The sysbench kit version used is v0.3.3 .

The MySQL Server and the sysbench kit are installed on a Sun Fire™ SPARC server.

  • Database Node :
    • CPU : 4 core UltraSPARC-IV x 1350 MHz
    • Memory : 32,768 MB
    • Operating System : Solaris 10, Update 4
    •  

MySQL Configuration options :

Option Possible reason for inclusion
--prefix Specify installation dir.
--xxdir Specify a directory for serving a purpose
--with-server-suffix Adds a suffix to the mysqld version string
--enable-thread-safe-client Make mysql_real_connect() thread-safe with this option, and recompile the distribution to create a thread-safe client library, libmysqlclient_r
--with-mysqld-libs Include libs in mysqld
--with-named-curses=-lcurses Use specified curses libraries instead of those automatically found by configure
--with-client-ldflags=-static compile statically linked programs
--with-mysql-ldflags=-static compile statically linked programs
--with-pic try to use only PIC objects, and omit usage of non-PIC objects
--with-big-tables Support tables with more than 4 GB rows even on 32 bit platforms
--with-yassl To use SSL connections; configure to use the bundled yaSSL library
--with-readline Do not use system readline or bundled copy
--with-xx-storage-engine Enable the xx Storage Engine
--with-innodb Include the InnoDB table handler
--with-extra-charsets=complex Additionally include all character sets that can't be dynamically loaded to be compiled into the server
--enable-local-infile Permits usage of LOAD DATA (LOCAL INFILE) with files on client-side file system. This adds flexibility. With LOCAL, no access to the server is needed except for the MySQL connection
--with-ndb-cluster Enables support for the ndb cluster storage engine on applicable platforms
--with-zlib-dir=bundled Helps the linker find -lz (libz.so) when linking client programs

 

Studio Compiler flags :

Compiler Options Possible reason for inclusion
-m64 or -m32 Specifies the memory model for the compiled binary object, and generates optimal code.
-mt Macro option that expands to -D_REENTRANT -lthread
-fsimple=1 The optimizer is not allowed to optimize completely without regard to roundoff or exceptions. A floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime. Include this explicitly in the C++ flags
-fns=no Selects SSE flush-to-zero mode and, where available, denormals-are-zero mode; causes subnormal results to be flushed to zero; where available, causes subnormal operands to be treated as zero
-xbuiltin=%all Improves the optimization of code that calls for standard library functions
-xO3 Generates a high level of optimization.
-xstrconst Inserts string literals into the read-only data section of the text segment
-xlibmil Selects the appropriate assembly language inline templates for the floating-point option and platform
-xlibmopt Enables the compiler to use a library of optimized math routines.
-xtarget=generic Specifies the target system for instruction set and optimization. It sets -xarch, -xchip and -xcache
-xrestrict Tells the compiler that there is no pointer aliasing between the arguments in functions
-xprefetch=auto Enables prefetch instructions
-xprefetch_level=3 Controls the aggressiveness of automatic insertion of prefetch instructions as set by -xprefetch=auto
-xunroll=2 Suggests to the optimizer to uroll loops n times. Instructions called in multiple iterations are combined into a single iteration. Register usage and code size may increase.
-xalias_level Provides information to the compiler about pointer usage, and enables it to perform type-based alias analysis and optimizations.

 

Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test; bottom cell numbers correspond to tps throughput when run with with SUNPRO_C source code change

1 2 4 8 16
1. Release binary (Studio 10) : -xO3 -Xa -xstrconst -mt -D_FORTEC_ -xarch=v8 -xc99=none [for C++ , use -noex , and remove -Xa and -xstrconst] 181.35 335.26 594.94 914.80 942.06
2. Studio 11 Baseline : -xlibmil -xO3 -DHAVE_RWLOCK_T -mt -fsimple=1 -fns=no 183.28 184.22 337.34 338.78 598.87 597.71 833.04 804.49 929.94 871.35
3. -xbuiltin=%all 186.10 185.59 345.30 342.60 604.56 603.12 812.25 921.53 930.23 942.36
4. -xbuiltin=%all -xunroll=2 188.53 189.39 347.28 349.89 613.24 613.07 927.57 846.86 941.56 888.54
5. -xbuiltin=%all -xprefetch=auto -xprefetch_level=3 184.16 186.40 343.20 344.76 602.33 604.19 839.48 813.32 943.60 948.48
6. -xbuiltin=%all -xalias_level=std [=simple for C++] 186.81 187.77 347.56 346.61 610.44 611.10 817.28 923.46 946.79 922.31
7. -xbuiltin=%all -xtarget=native 190.70 190.84 354.24 353.79 619.54 620.02 828.70 849.72 948.16 898.97
8. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 188.21 188.44 348.44 348.51 614.24 613.37 840.94 926.02 948.76 942.59
9. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 -xalias_level 187.86 189.38 348.22 347.57 618.12 618.38 850.12 851.36 941.20 909.14

 

Recommended compiler options for integration with webstack

A.) The recommended Studio 11 compiler flags for webstack on SPARC are ' -xbuiltin=%all' and ' -xtarget=native, -xunroll=2'. With -xtarget, a throughput increase of 3.4%-4% was observed over the baseline. With -xunroll, a throughput increase of 2.3%-3.6% was observed over the baseline.

B.)The SUNPRO_C source change yielded a throughput increase in two-thirds of the cases over those run without this change. This option can be used for SPARC and x64 platforms. The original MySQL sources have explicit inlining of small support functions only with gcc and Visual C++. However, this inlining is found to help Sun Studio as well, and can be enabled with the following change to the header file $MYSQL_HOME/innobase/include/univ.i on line 61:

#if !defined(GNUC) && !defined(WIN) && !defined(__SUNPRO_C)

 

Studio 12 64-bit (1-64 threads)- sysbench read-only oltp test;

Compiler Options 1 2 4 8
1. Release binary (64-bit) : -m64 -O2 -mtune=k8 [LDFLAGS=-static-libgcc] 182.93 335.54 592.00 902.69
2. Studio 12 (64-bit): -Xa -fast -m64 -xarch=sparc -xstrconst -mt [for C++, append -noex -fsimple=1 -fns=no and remove -Xa] 197.81 346.33 586.13 685.93
3. Feedback Optimization added (FBO): As in 3 with -xprofile=use:dir ] 223.09 385.44 635.79 719.65
4. FBO + Loop Unrolling : As in 4 with -xunroll=2 227.25 388.14 631.89 724.14
5. FBO + Prefetching : As in 4 with -xprefetch=auto -xprefetch_level=3 228.64 395.34 651.52 723.02
6. FBO + Restricted Pointer Parameters : As in 4 with -xrestrict=%all 228.91 393.56 635.01 736.71

The Studio 12 compiler flags that performed the best were -xrestrict=%all and '-xprefetch=auto -xprefetch_level=3', when used with FBO. These combinations gave a throughput increase of 8% - 15% over the baseline studio64 (without FBO).

 

Software documentation links

  • Solaris 10 OS : (here)
  • Sun Studio 12 Compiler Collection : (here)
  • Sun Fire™ Servers : (here)
  • MySQL Database : (here)
  • sysbench site : (here)

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Krish Shankar

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today