Thursday May 26, 2011

x86 Optimization for Oracle Solaris Applications

Resources for x86 Optimization of Oracle Solaris Applications

As Oracle Solaris tracks Intel's innovations in x86 architecture, customers can be assured of high performance and mission-critical operation across Oracle's line of Intel based systems. Oracle Solaris has a single source base supporting both x86 and SPARC platforms; a number of architecture-specific enhancements ensure the best in performance, robustness and reliability.

  • Latest instruction set support in Studio compilers
  • Self Healing for x86 CPUs
  • Kernel-level power management
  • Support for high-bandwidth interconnects
  • Optimization for Intel's QuickPath Interconnect
  • Integration and test with entire Oracle Product Family

Click to enlarge

See more...

Sunday Jun 29, 2008

Tracing PHP Applications Using DTrace

In my previous post, I showed how you can run the NB-bundled sample PHP application, AirAlliance. When you are developing PHP applications in Solaris/OpenSolaris, you can use DTrace to debug your applications.

I have written a short write-up explaining how you can trace the AirAlliance sample application using DTrace in OpenSolaris 2008.05. But the information is applicable to any PHP application you run on Solaris/OpenSolaris.

Read on.

Tuesday Mar 25, 2008

Tracing PHP Function Calls Using DTrace

In NetBeans, you can use the DTrace plug-in to analyze the performance of your PHP applications. A DTrace provider for PHP that adds probes to function entry and exit points has been available through Cool Stack. The support is also provided through Web Stack, if you are using SXDE 1/08.

To get started, follow these tutorials:
1.Configure NetBeans for DTrace.
2.Get PHP DTrace Extension.

Once you are done with the steps provided in the above articles, start your Apache web server and check the probes.

bash-3.2# dtrace -l | grep php

17952    php1895       php_dtrace_execute function-entry
17953    php1895       php_dtrace_execute_internal function-entry
17954    php1895       php_dtrace_execute function-return
17955    php1895       php_dtrace_execute_internal function-return
17956    php1896       php_dtrace_execute function-entry

I'm running a simple PHP application that has some function calls. Go to the DTrace window in the NetBeans IDE and start the

Here is my output. The output shows which function is being invoked and the invoke time.

My example has only a few SOAP calls. If you have a complex application with DB calls, this script will be useful to analyze the performance bottlenecks in your application.

Wednesday Jan 09, 2008

Sun HPC ClusterTools 7.1 Released

Sun HPC ClusterTools 7.1 is now available!

Sun HPC ClusterTools 7.1 is an update release based on Open MPI 1.2.4. Included in Sun HPC ClusterTools 7.1 are Intel 32- and 64-bit support, improved parallel debugger support, PBS Pro validation, improved memory usage for communication, and other bug fixes by the Open MPI community since Open MPI 1.2 was first released.

Sun HPC ClusterTools 7.1 software supports:

  • Sun Ultra SPARC III or greater, Opteron, or Intel systems
  • Solaris 10 11/06 Operating System
  • Sun Studio 10, 11 and 12
  • Shared memory, TCP, and Infiniband communication
  • The full MPI-2 specifications
  • Download from 

    Open MPI is the open source version of the MPI parallel programming API.
    » Learn more about Open MPI

    Wednesday Jan 02, 2008

    NEW BOOK: Solaris Application Programming

    Solaris Application Programming, by Sun engineer Darryl Gove, has just been published by Sun Press/Prentice Hall.

    Here's the back-of-the-jacket blurb:

    Solaris™ Application Programming is a comprehensive guide to optimizing the performance of applications running in your Solaris environment. From the fundamentals of system performance to using analysis and optimization tools to their fullest, this wide-ranging resource shows developers and software architects how to get the most from Solaris systems and applications.

    Whether you’re new to performance analysis and optimization or an experienced developer searching for the most efficient ways to solve performance issues, this practical guide gives you the background information, tips, and techniques for developing, optimizing, and debugging applications on Solaris.

    The text begins with a detailed overview of the components that affect system performance. This is followed by explanations of the many developer tools included with Solaris OS and the Sun Studio compiler, and then it takes you beyond the basics with practical, real-world examples. In addition, you will learn how to use the rich set of developer tools to identify performance problems, accurately interpret output from the tools, and choose the smartest, most efficient approach to correcting specific problems and achieving maximum system performance.

    Coverage includes

    • A discussion of the chip multithreading (CMT) processors from Sun and how they change the way that developers need to think about performance
    • A detailed introduction to the performance analysis and optimization tools included with the Solaris OS and Sun Studio compiler
    • Practical examples for using the developer tools to their fullest, including informational tools, compilers, floating point optimizations, libraries and linking, performance profilers, and debuggers
    • Guidelines for interpreting tool analysis output
    • Optimization, including hardware performance counter metrics and source code optimizations
    • Techniques for improving application performance using multiple processes, or multiple threads
    • An overview of hardware and software components that affect system performance, including coverage of SPARC and x64 processors


      You can get it at Powells Books online (my favorite online bookstore).


    Thursday Dec 20, 2007

    Live Streaming Video

    Here we are playing with live streaming video at the December meeting of the Silicon Valley Open Solaris User Group:


    Tune in next month for more exciting content!:)

    Friday Dec 14, 2007

    NetBeans IDE 6.0 now available with features that will be in the next Sun Studio IDE

    NetBeans IDE 6.0 is now available for download at Select the C/C++ bundle, which includes the base IDE and C/C++ support. Installation on Solaris, Linux, Windows, or Mac OS X is quick and simple.

    You can create projects now using NetBeans IDE 6.0, and work with these projects in the next release of the Sun Studio IDE.

    New features added to the C/C++ support in this release of NetBeans IDE 6.0 will be available in the Sun Studio IDE in SXDE 1/08 and Sun Studio Express 1/08, which will be released in January/February. These features include:

    - Improvements in the Classes window, which lets you see all the classes in your project and their members and fields.

    - New Include hierarchy, which lets you inspect the hierarchy of source and header files in your project

    - New Type hierarchy, which lets you inspect all types and subtypes of a class

    - Code completion for #include directives 





    Sunday Dec 09, 2007

    Setting up memcached on Solaris

    These days everyone seems to use memcached, a high-performance, distributed memory object caching system, intended for use in speeding up web applications. Performance can be greatly improved from moving away from disk fetch to a RAM fetch. Here is an excellent article explaining memcached taking LiveJournal as a case study.
    Do you know that memcached daemons can be set up on Solaris Zones too? For this you need to download memcached package from the Cool Stack site.

    What you need to get?
    1. Cool Stack 1.2
    2. memcached Java Client APIs (Get this jar and this jar).

    Create and Boot Zones
    Create 3 zones - zonea, zoneb and zonec to test memcached. For information on creating Solaris zones read this article.

    I'm using SXDE 9/07
    OK you don't have SXDE? Get it!

    Here is the status of my zones:

    # zoneadm list -vc
    0 global running / native shared
    4 zonea running /zones/zonea native shared
    5 zoneb running /zones/zoneb native shared
    6 zonec running /zones/zonec native shared

    Start memcached on all Zones

    Follow the Cool Stack site for installing Cool Stack on your zones. When you do a pkgadd -d <memcached\*.pkg>, memcached gets installed in all the available zones even though they are not in a running state.

    When everything is set, these commands should work fine:

    zonea# ./opt/coolstack/bin/memcached -u phantom -d -m 100 -l -p 11111
    zoneb# ./opt/coolstack/bin/memcached -u phantom -d -m 100 -l -p 11112
    zonec# ./opt/coolstack/bin/memcached -u phantom -d -m 100 -l -p 11113

    Start memcached as non-root user on all all zones with 100 MB memory bucket. This should be OK for testing but ideally in a production setup it should be around a terabyte.

    If you don't know already, each Solaris zone can bind to an IP and port through the virtual interface. So you don't need 3 NICs or 3 machines - but just 3 zones.

    Test memcached

    Set the classpath pointing to the downloaded jars. Use NetBeans for simplicity. Store an object in-memory and retrieve it from the memcached daemon running on zonea:

        //Interact with zonea
        MemcachedClient c;
        try {
            c = new MemcachedClient(new InetSocketAddress
                    ("", 11111));

            String test=new String("I'm going to be cached!");
            c.set("mykey", 180, test);
            Object obj=c.get("mykey");
        } catch (IOException ex) {


    We are storing an object for 3 mins. After retrieving the object, you can clean the cache. When you compile and run the program,  the output will look like:

    2007-10-03 10:38:49.615 INFO net.spy.memcached.MemcachedConnection: Connected to {QA sa=/, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} immediately
    I'm going to be cached!

    From your code, you can also connect to multiple memcached servers and store objects.
    This is quite interesting. You can halt one zone and can try to store object in-memory on all the three zones.

    # zoneadm -z zonea halt
    # zoneadm list -vc
    0 global running / native shared
    5 zoneb running /zones/zoneb native shared
    6 zonec running /zones/zonec native shared
    - zonea installed /zones/zonea native shared

    Now zonea is no longer running memcached because zonea zone is down.
    Here is the modified code:

        MemcachedClient c;
        try {
            //zonea, zoneb and zonec
            c=new MemcachedClient(AddrUtil.getAddresses

            String test=new String("I'm going to be cached on all zones!");
            c.set("mykey2", 180, test);
            Object obj=c.get("mykey2");
        } catch (IOException ex) {

    We are trying to store the object on all the available servers. But zonea is offline.
    Here is the output:

    2007-10-03 11:18:27.678 INFO net.spy.memcached.MemcachedConnection:
    Added {QA sa=/, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue

    2007-10-03 11:18:27.682 INFO net.spy.memcached.MemcachedConnection:

    Connected to {QA sa=/, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} immediately

    2007-10-03 11:18:27.684 INFO net.spy.memcached.MemcachedConnection:

    Connected to {QA sa=/, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} immediately

    I'm going to be cached on all zones!

    The object is queued for insertion whenever zonea comes up. It would be interesting to test the automatic failover behavior of memcached considering the fact that memcached is a mother of all hashtables and there should be sufficient fail safe plumbing required between running instances of memcached daemons. You can also use DTrace for memcached debugging.

    Wednesday Nov 07, 2007

    New Book: USING OPENMP

    Using OpenMP MIT PressIn October, MIT Press published the first really comprehensive book on parallel programming using the OpenMP API.

    Using OpenMP,
    Portable Shared Memory Parallel Programming, by Barbara Chapman, Gabriele Jost and Ruud van der Pas, with a foreword by David J. Kuck, covers shared-memory parallelization on a number of platforms.

    I've had a pre-publication peek at it (because one of the authors, Ruud van der Pas, is a Sun engineer I work very closely with), and I can say that it packs a lot of very useful information in one place.

    Most useful is a discussion of how cache lines can interfere with OpenMP parallelization and cause performance and scaling degradation if you are not aware of what's going on.

    This will be a very valuable book on my shelf when it's available.



    The Book On Parallel Programming

    Wilkinson/Allen: Parallel ProgrammingWilkinson and Allen's PARALLEL PROGRAMMING was published in 1999 (Prentice Hall). A revised edition came out in 2005. And I finally got my hands on it.

    It IS expensive.  Maybe you can now find used copies (be sure it's the 2nd edition!)

    Used as a computer science text, it is a complete course on parallel programming techniques and algorithms, and covers MPI, Pthreads, and MPI. The new material covers distributed shared memory systems (DSM).

    And, as a course, it is rich with examples and end-of-chapter homework problems. Some of them are really creative, like: "Write a multithreaded program to simulate two automatic teller machines being accessed by different persons on a single shared account."

    Some of the typical parallel applications explored are image processing, numerical algorithms, and sorting and searching techniques. 

    I went thru the first edition (minus the problems) when it first came out. Now this expanded revised edition has me totally entranced, again.  This is a great book, and I highly recommend it to anyone thinking seriously about parallel programming, especially for technical computing applications.

    Monday Oct 08, 2007

    Learning About Parallel Programming

    Clearly, multithreaded or parallel programming is growing in importance as the new chip architectures move towards multiple cores and multiple threads rather than merely upping clock cycle times.

    But parallel programming is hard, no doubt about it. It's so much easier to cast a computational problem into a serial, one-step-at-a-time framework than think of it in terms of data distribution over networks of compute nodes.

    So how do you learn parallel programming?

    Luckily there are a number of resources on the web. Here are just a few links

    Wednesday Jun 27, 2007

    Making Sense of Parallel Programming Terms

    Making Sense of Parallel Programming Terms is an article that ties together much of the terminology of multithreaded or parallel programming. 

    New Article: Prefetching

    Prefetching Pragmas and Intrinsics

    Diane Meirowitz, Senior Staff Engineer, and Spiros Kalogeropulos, Staff Engineer, June, 2007


    Explicit data prefetching pragmas and intrinsics for the x86 platform and additional pragmas and intrinscs for the SPARC platform are now available in Sun Studio 12 compilers, released June 2007.

    Prefetch instructions can increase the speed of an application substantially by bringing data into cache so that it is available when the processor needs it. This benefits performance because today's processors are so fast that it is difficult to bring data into them quickly enough to keep them busy, even with hardware prefetching and multiple levels of data cache.

    The compilers have several options that enable them to generate prefetch instructions automatically: -xprefetch, -xprefetch_level, and -xprefetch_auto_type (described below). The compilers generally do an excellent job of inserting prefetch instructions, and this is the most portable and best way to use prefetch. If finer control of prefetching is desired, prefetch pragmas or intrinsics can be used. Note that the performance benefit due to prefetch instructions is hardware-dependent and prefetches which improve performance on one chip may not have the same effect on a different chip. It is a good idea to study the instruction reference manual for the target hardware before inserting prefetch pragmas or intrinsics. Furthermore, the Sun Studio Performance Analyzer can be used to identify the cache misses of an application.

    [Read More]

    Monday Jun 25, 2007

    Project D-Light Tutorial

    D-Light Tutorial

    Project D-Light is a plug-in for Sun Studio 12. The plug-in offers a variety of instrumentation that takes advantage of the DTrace debugging and performance analysis functionality in the Solaris Operating System.

    Project D-Light System Requirements

    Project D-Light currently only runs on the Solaris 10 OS and requires a DTrace enabled Java Runtime Environment, at least version 6. The Solaris Express Developer Edition 05/07 OS, which is available free of charge from, is the recommended platform to run the tool. To check your Solaris version, type cat /etc/release at a shell prompt.


    [Read More]

    Saturday Jun 23, 2007

    Technical Articles on Performance Tuning

    Here's a list of the current SDN articles dealing with tuning and optimization of applications on Solaris using Sun Studio compilers: 

    Here are examples of using a compiler flag or inline assembly language with Sun Studio compilers to increase the performance of C, C++, and Fortran programs. (June 4, 2007)
    This article describes how to profile an IBM WebSphere Application Server (WAS) runtime environment with the Sun Studio Performance Analysis Tools, Collector and Analyzer. (January 30, 2007)

    The SHADE library is an emulator for SPARC hardware. The particular advantage of using SHADE is that it is possible to write an analysis tool which gathers information from the application being emulated. The SHADE library comes with some example analysis tools which track things like the number of instructions executed or the frequency that each type of instruction is executed. A more advanced analysis tool might look at cache misses that the application encounters for a given cache structure.
    (September 29, 2006)

    Click on the link below to see the complete list-
    [Read More]

    Friday Jun 22, 2007

    C and C++ Technical Articles

    There are a lot of useful, short, technical articles on the SDN portal directed at the C and C++ programmer on Solaris. 

    This article gives an overview of the C-language extensions (part of the GNU C-implementation) introduced in the Sun Studio 12 C compiler. Although these extensions are not part of the latest ISO C99 standard, they are supported by the popular gcc compilers. (June 13, 2007)
    Ever wonder what's in a patch? Take a look inside a typical Sun Studio compiler patch that includes updated C++ shared libraries. (November 10, 2006)
    The Sun C++ compiler ships with two libraries that implement the C++ standard library: libCstd and libstlport. This article discusses the differences between the two libraries and explores the situations in which one library is preferred over the other. (May 17, 2006)
    Hiding non-interface symbols of a library within the library makes the library more robust and less vulnerable to symbol collisions from outside the library. To get the appropriate linker scoping in a source program, you can now use language extensions built into the Sun Studio C/C++ compilers. (May 17, 2005)
    The author shows how a simple C++ program fails without language linkage, but can succeed with proper linkage. (This article is on the Solaris portal.) (March 23, 2005)
    In this test case, the DTrace capability in the Solaris 10 OS is used to identify an error common to C++ applications -- the memory leak. (This article is on the Solaris OS developer portal.) (February 17, 2005)
    The C++ language provides mechanisms for mixing code that is compiled by compatible C and C++ compilers in the same program. You can experience varying degrees of success as you port such code to different platforms and compilers. This article shows how to solve common problems that arise when you mix C and C++ code, and highlights the areas where you might run into portability issues. (July 25, 2003)
    This article discusses some of the runtime errors related to memory management and how you can use the garbage collection library, libgc to fix these errors. In most cases, just linking with the library without making any changes to your code will fix the errors. (July 25, 2003)
    Using the restrict qualifier appropriately in C programs may allow the compiler to produce significantly faster executables. (July 17, 2003)
    A discussion of the evolution of the C++ Application Binary Interface, and its implications. (March 20,2003)
    Application software developers can learn to use the latest version of the Solaris OS while supporting previous versions. (January 1, 2002)

    New Article: Fortran Library Functions - How To Call Them

    Here's a new article from Michael Ingrassia, one of the lead developers in the Fortran Compiler team at Sun:



    The Fortran language, both the standard language adhered to by all vendors and the extended language offered to Sun customers, contains several different models for how to call library functions. Confusion can result if you aren't clear about which model you should use for the particular library function you are calling. In this paper we try to describe the different models and provide examples of what might go wrong.

    In general, any function found in a library can be called from Fortran. (Subroutines are rarer; for convenience we'll consistently refer to functions in this article where we should properly say function/subroutine). Here's a recipe for how to do it, which contains several specific recommendations for good coding practices.

    To start , gather the following information:

    • The name of the function/subroutine
    • The correct declaration for the function/subroutine
      (sometimes this is provided in a file which must be included into your program)
    • The name of the library in which the function is supplied
      (not required for intrinsic functions)

    Depending on what you find, there are 5 different models determining how to properly call this function:

    • It's a standard Fortran intrinsic
    • It's a non-standard Fortran intrinsic
    • It's a non-intrinsic Fortran run-time library function
    • It's a non-intrinsic function not supplied in the Fortran run-time libraries, but callable from Fortran
    • It's not a Fortran-callable function

    Lets look at each situation in detail.


    [Read More]

    Thursday Jun 14, 2007

    Sun Studio 12 IDE Quick Start Tutorial

    Sun Studio IDE screenshot

    There's now a quick-start tutorial guide for using the new Sun Studio 12 IDE. It's here.

    Wednesday Jun 13, 2007

    New Sun Studio Article: C Language Extensions

    There's a new article on the Sun Studio SDN portal: 

    New Language Extensions in the Sun Studio 12 C Compiler

    By Dmitry Mikhailichenko, Sun Microsystems, St Petersburg, Russia, June 2007  

    This article gives an overview of the following C-language extensions (part of the GNU C-implementation) introduced in the Sun Studio 12 C compiler. Although these extensions are not part of the latest ISO C99 standard, they are supported by the popular gcc compilers.

    • The typeof keyword which allows references to an arbitrary type
    • Statement expressions that make it possible to specify declarations and statements in expressions
    • Block-scope label names

    The article also demonstrates how to use the new C compiler features for creating generic macros on example of linked-list manipulation-routines. Such macros semantically mimic C++ Standard Template Library, support arbitrary data types and provide strict compile-time type-checking.

     »Read the entire article on the Sun Studio portal

    Saturday Jun 09, 2007

    SCREENCASTS: Using the Sun Studio Performance Analyzer

    Performance Analyzer lead engineer, Marty Itzkowitz, demonstrates the basics of application performance analysis, Java profiling, and hardware counter profiling using Sun Studio 12 tools in three new demos:

    Friday Jun 08, 2007

    New Features in Sun Studio 12 Compilers

    Here's what's new and changed in the Sun Studio 12 compilers: 

    A New Way To Specify 32-bit or 64-bit Address Model

    You no longer need use the -xarch option to specify a 32-bit or 64-bit address model (LP64 versus ILP32). Two new options make it easier:

    • -m32 specifies the ILP32 model: 32-bit ints, longs, and pointer types.

    • -m64 specifies the LP64 model: 32-bit ints, 64-bit longs and pointers types. (Note that -m64 is the default on Linux platforms.)

    Deprecated -xarch Flags and Their Replacements Across SPARC and x86

    • Use -m64 in place of -xarch=generic64

    • Use -m64 -xarch=native in place of -xarch=native64

    If you are using -xarch=v9 or -xarch=amd64 to specify a 64-bit address model, use just -m64 instead. No -xarch value is required.

    [Read More]

    New Article: Performance Tuning with Sun Studio and Inline Assembly Code

     There's a new article on the SDN Sun Studio portal:

    Performance Tuning With Sun Studio Compilers and Inline Assembly Code

    By Timothy Jacobson, Sun Microsystems, June 2007  
    For developers who need faster performance out of C, C++, or Fortran programs, Sun Studio compilers provide several efficient methods. Performance tuning has always been a difficult task requiring extensive knowledge of the machine architecture and instructions. To make this process easier, the Sun Studio C, C++, and Fortran compilers provide easy-to-use performance flags.

    By using performance flags, developers can quickly improve execution speed. However, sometimes compiler flags alone do not result in optimum performance. For this reason, Sun Studio compilers also allow inline assembly code to be placed in critical areas. The inline code behaves similarly to a function or subroutine call, which enables cleaner, more readable code and also enables variables to be directly accessed in the inline assembly code.

    This paper provides a demonstration of how to measure the performance of a critical piece of code. An example using a compiler flag and another example using inline assembly code are provided. The results are compared to show the benefits and differences of each approach.



    For demonstration purposes, this paper uses an academic program to generate the Mandelbrot set. The example Mandelbrot program is written in C. Computing all the pixel values of the Mandelbrot set using the Sun Studio compiler is timed. Then, optimization flags are used and the computations are timed again. Finally, example Sun Studio inline assembly code is used and the computations are timed again and compared with the previous timings. The examples demonstrate two different methods for improving performance with the Sun Studio compiler: using flags and using inline assembly code.



    Under Construction

    ... not ready yet, but stay tuned.

    Application development on Solaris OS


    « August 2016