Wednesday Mar 18, 2009

Participate Sun Studio 12 Update 1 Early Aceess Program

The early access program of Sun Studio 12 Update 1 was rolled out on March 17, 2009. People might wonder why they should put any effort in participating a software tool early access program. Actually there are several substantial benefits for the Solaris or Linux developers to participate in this EA program and I will articulate the reasons later. But first I need to explain what new features are included in Sun Studio 12 Update 1 EA.

Sun Studio 12 update 1 includes many significant features:

    \* C/C++/Fortrant compiler optimization for the new X86 chips from Intel and AMD.
  • \* C/C++/Fortran compiler optimization for the latest UltraSPARC and SPARC64.
  • \* A new stand-alone GUI debugging tool, dbxtool.
  • \* Full OpenMP 3.0 implementation in compilers with performance and debugging tools support.
  • \* A MPI performance analyzer with an easy-to-use GUI environment.
  • \* New IDE with remote development features based on NetBeans 6.5.

I have discussed two important features, OpenMP 3.0 and MPI performance analysis tool, in my previous blog “An Exciting Software Tool Release For The Parallel Application Developers” on Nov 11, 2008. The other new features such as the compiler performance improvement for Sparc and X86 new chips, stand-alone debugging tool and new IDE are very important to many developers too.

The serious software developers such as ISV vendors or enterprise application developers can get a good performance boost without changing any single line of source code just by switching to the new compiler. The compiler optimization will take advantage of the new chip instructions to produce the more efficient binary code.

Many UNIX traditional style developers like to open several command windows to edit, compiler and debug their programs instead of developing their software within an IDE. The stand-alone debugging tool is re-introduced by the requests from many such traditional style developers. The other big benefit of the stand-alone debugging tool is to let the developers login to a remote host and debug their code graphically with the remote light-weight debugging tool through X-window.

The new IDE provides a more productive development environment derived from NetBeans 6.5. Its remote development feature enables the cross-platform development, the software engineers can now develop the Solaris and Linux applications from their Windows laptops directly.

Here I want to explain why you should participate in this Sun Studio 12 Update 1 EA program. Sun Studio 12 Update 1 is a major Sun Studio release with many new features after nearly two years. The EA program allow you to try these new tool features with near final product quality immediately. You should get the performance and productivity boost right away. More importantly the EA program let you participate and help shape the future direction of Sun Studio tools. If the tools in EA program don't meet your requirements, it is a good opportunity for you to give us feedback and we will improve the tools to meet your specific development need. Even if you are a student or casual developer, you can participate the EA program and discuss any related software development issues with our senior engineers through technical forum.

Wednesday Jul 12, 2006

A Brief Explanation Of Race Condition Problem In Parallel Programming


A race condition is a programming fault producing undetermined program state and behavior due to un-synchronized parallel program executions. Race condition is the most worried programming fault by experienced programmers in parallel programming space. However there are many subtle aspects of race condition issues. A race condition problem is often caused by common data accessing, but it can also occur in a sequence of operations which require a protection such as atomic transaction to ensure the overall state integrity. Not every data race case is a programming bug. There is a compromised aspect of allowing race condition in a parallel program for performance reason. Last but not least, there may be a very subtle unexpected programming error underneath a race condition problem symptom.

Here a simple and popular parallel partitioning example is used to explain the above various aspects of race condition issues. Partitioning is one of the popular tasks in HPC application programs dealing with huge number of objects. In the partitioning example shown below, it creates N Pthreads to sort and collect the objects into N group containers according to the object attributes simulataneously. During the collection, each thread counts its group objects and add the group count into the common total sum to check if any object is missed in the collection.


// global declaration

#include <stdio.h>

#include <math.h>

#include <pthread.h>

#include “element.h”

#include “container.h”

#define NGRPS 30

int object_count = 0;

element\* object_array;

container group_array[NGRPS];

int total_count = 0;

void\* collect(void\* arg)


int j;

int group_id = *((int *) arg);

int group_count = 0;

attribute group_attribute = get_group_attribute(group_id);

container group_container = group_array[group_id];

for (j = 0; j < object_count; j++) {

element current_object = object_array[j];

if (current_object.collectFlag == true) continue; // this flag is initialized to false

if (current_object.matchAttribute( group_attribute)) {

current_object.collectFlag = true;





total_count += group_count;

return NULL;


int main(int argc, char\*\* argv)


int i;

pthread_t pids[NTHRS -1];

object_count = process_input_data(argv[1], &object_array);

for (i = 0; i < NTHRS; i++) {

pthread_create(&pids[i], NILL, collect, (void\*) &i);


if (total_count != object_count) {

printf(“ the collected object count %d doesn't match the original object count %d\\n”,total_count, object_count);



Data Race Condition Problem

Data race condition problem is very often to occur in shared memory parallel programming model such as Pthread and OpenMP programs. A data race occurs when multiple threads access a shared memory location with undetermined accessing order and at least one access is to write a new data into the shared memory location.

In the above program example, a data race problem occurs at the second to last statement total_count += group_count in the collect routine. Total_count is a common static variable shared by all the collection threads. When a thread in the process of reading the value of total count and adding its group count value to the total count, another thread may step in and read the old value of total count simultaneously. Right after the first thread writes a new value to the total count, the second thread may overwrite with its new value to the total count and wipe out the computing effort from the first thread.

The effect of a data race problem is quite subtle and hard to locate. In my own experience, I encountered a data race problem almost identical to the problem mentioned above. Because the problem symptom only showed up once in probably hundred runs, I felt puzzled about the program behavior and suspected many things including unstable memory chips. I was over confident about my programming skill and never thought of a race condition in my code until a colleague pointed it out.

Benign Data Race Condition

Not every data race condition is a harmful problem. In the above program example there is another data race problem occurs in the collect routine. The second data race occurs at the checking of current_object's collect_flag and the update of the collect_flag when a thread collects a object into its group container. In this program example, an object attribute uniquely matches one group attribute only. Therefore the collect flag checking seems to be redundant and troublesome by the data race condition. But when you analyze this program further, you will find that the collect flag checking will eliminate the unnecessary computation to match an owned object attribute to the group attribute. The second data race condition in this program case only affects the degree of eliminating the object attribute matching computation. No harmful result will be caused by this data race condition. On the contrary, it is good to keep this kind of data race condition there for performance reason.

In shared memory parallel programming model, the threads must communicate a critical data among one another. In principle all the threads need to read this critical data to be aware of the current overall program state and decide what to proceed further. At least one thread needs to produce new critical data value and update it. This kind of checking and update pattern is quite often in parallel programming, but it causes a data race in nature. If every update action needs to halt and synchronize all the participating threads, it will reduce the parallel computing efficiency significantly. Therefore a data race is a necessary compromise in this case.

General Race Condition

A general race condition problem is caused by the undetermined program sequence of executions to violate the program state integrity and cannot attribute to a single memory location access. The paper “What are Race Conditions?” by Robert H.B. Netzer and Barton P. Miller from University of Wisconsin defines formally a general race condition. This is a much harder problem than a data race condition problem. The challenging nature of general race problem prompts many computer science researchers to study transaction memory approach.

Here we continue with the partition program example to explain what is a general race problem. Let's say after the first phase of collection, the partitioning program needs to fine tune and shuffle some objects from a group container to another group container as shown in the code below.


void shuffle_objects( container\* source, container\* destination, element\* target_objects ) {

// remove target objects from source container


source.remove_array( target_objects );


// Here is a transitory state which may cause general race condition

// add target objects to source container


destination.add_array( target_objects );



The program looks clean and simple enough. However a snap shot of transitory state occurs between the remove and add actions of the two group containers. If there is another thread working on the objects to perform another computing task in parallel, it may find the target objects are homeless at the critical time. The obvious answer seems to encapsulate the entire shuffle_objects method into an atomic derivative and don't allow another other thread or process to interfere in the middle. However this fix may not be a complete solution to meet the application partitioning requirements. For example, if the partitioning program deals with electronic component objects and their child pin objects, the grouping of the components and their pins must be atomic to keep the parent and child objects in a consistent state.

The program code below shows a transitory state which may produce general race condition fault.

partition_shuffle.cpp (continuous)

void shuffle_components(container\* component_source, container\* component_destination, elements\* target_components)


shuffle_objects(component_source, component_destination, target_components);

// Here is another transitory state which may cause general race condition

pins\* target_pins = get_child_pins(target_components);

container\* pin_source = get_pin_container( component_source);

container\* pin_destination = get_pin_container(component_destination);

shuffle_objects(pin_source, pin_destination, target_pins);


Therefore the atomic transaction type requirement may impose on a long sequence of operations.

But when the atomic sequence of operations become complex and long, it misses the original intention of parallel programming . The big issue of general race is that it is very subtle to avoid in the first place and also very hard to fix even you are lucky to discover the problem. Furthermore different from data race, general case can occur in the distributed memory parallel programming such as MPI program as well as shared memory parallel programming.

Understand The Cause Of A Race Condition Problem

Let's come back to collect routine in the beginning partitioning example program. There is a third data race condition problem in this routine. This data race problem is quite subtle and hard to understand the cause without serious investigation. In the collect routine, the statement group_container.add(current_object) has a data race problem.

Almost every programmer will not believe there is a data race problem at this statement by looking into the code. As a matter of fact, this data race problem is caused by another data race problem, the fourth one in such a simple program.

It is easier to explain the fourth data race problem in this routine. The group ID is coming from the routine argument arg which is an pointer passed from the address of loop index i in the main program. The main thread program will advance loop index i and write a new value to the collect argument arg memory location. Therefore it is a data race condition to read this loop index and convert it to group ID in collect routine. Because of this data race problem, two different threads working on collect routine may get the same group ID value and produce the data race problem at the statement group_container.add(current_object).


Parallel programming is a new world for most software developers. The subtlety and complexity of parallel programming is far beyond the sequential programming. The software developers need to use the right tools in this new world. No doubt race detection tool is a critical tool the parallel software developer need to learn and use regularly. Although the current state of parallel developer tools is still not mature enough, there are some early tools available today and more engineering resources are working on it. Current Sun Studio Express June 2006 build features Data Race Detection Tool, you can go to the website to look for more information and get the free download if you are interested in exploring the race condition problems.

Tuesday Jun 14, 2005

Debug a Large Scale and Complex Solaris Application with Sun Studio

<B><FONT SIZE=4>Debug a Large Scale and Complex </FONT><A HREF=""><FONT SIZE=4>Solaris</FONT></A> <FONT SIZE=4>Application with </FONT><A HREF=""><FONT SIZE=4>Sun Studio</FONT></A></B>

Welcome to the launch of OpenSolaris

Sun Studio is a complete set of software tools with Integrated Development Environment for Solaris application development. Currently Sun Tools organization is planning to let OpenSolaris registered participants have the right to use Sun Studio 10 free. Here I like to discuss some practical issues of debugging an application and how OpenSolaris participants can use free Sun Studio to deal with their Solaris application development. This blog only talks about a small set of features in Sun Studio for debugging. There are much more in the tool set and wait for you to download and try out yourself.


Debugging usually occupies a significant percentage of software development cycle time. The experienced developers tend to design and implement their codes in a good defensive way to reduce the debugging pain. Very rarely the developers can eliminate the debugging need when the program reaches to a modest size or complexity. The good thing about Java programming language is to let developers focus more on design logic issues rather than the subtle programming problems such as memory management. Java also induces a developer to think and design the program in a more object oriented way and make the codes easier to debug. However the ultimate code quality is determined by the programmers' skill and experience rather than the programming language. A bad programmer can produce a poisonous code to pollute all the modules integrated with it and make the entire program very hard to debug. Fortunately in the software development community, more often are the competent programmers who always produce good quality codes with very clear logic flow. When a good programmer encounters a program fault, the first reaction is to visualize and trace the program logic flow in mind to analyze the problem nature. For a simple bug, the cause may be easily located without a diligent debugging effort. When the target program become more complex or developed by a big team, the programmer needs to rely on a good debugging tool with a sound debugging method to investigate and nail down the bug. Here are some practical tips for Sun Studio users to go after a subtle bug in a large scale and complex Solaris application.

Debugging Process

The process of debugging can be divided into three main steps:

  1. 1. Analyze and understand the problem symptom(s) fully

  2. 2. Trace backward the programming logic from the place of symptom(s) to the place of root cause. In some cases, a symptom is caused by the chain reaction of several bugs. Therefore this tracing step may require several iterations to go through every bug cause in the chain.

  3. 3. Find an optimal way to fix the problem, then rebuild and verify the target program. If the newly built program doesn't work in the expected way, go back to step 1 or step 2 to repeat the process.

It may take several debugging cycles to fix a real subtle problem. In general the second tracing logic step is the most challenging and time consuming step of the entire debugging process. Therefore our discussions here will focus on how to leverage Sun Studio to perform the second debugging step.

The Challenge of Debugging a Large Scale and Complex Program

When the target program is large scale and has complex logic, the programmer needs to overcome several big barriers in debugging the program. Some of the most significant problems are listed below:

  1. 1. The program is a part of a very complicated software runtime environment and cannot run independently.

  2. 2. The program fault occurs after a very long run, say hours or even days.

  3. 3. The programming logic distance from the root cause to the symptom may be very far.

  4. 4. The program size may be so big that the debugging tool runs very slowly, especially with breakpoints.

  5. 5. The call stack becomes very deep and hard to follow, especially within deep recursive routine.

  6. 6. The program is implemented by a large team, a single programmer cannot understand all the codes.

  7. 7. The program shows symptom which seems to be caused by corrupted memory or memory leakage.

How To Leverage Sun Studio Debugging Tool

Here are some tips you may consider to apply when you use Sun Studio to debug a Solaris program which has one or many of the above described barrier(s).

Short Cut the Setup Effort

In debugging, the first move is to get to the symptom place to examine all the suspicious data. If you already have a core file. You can just load the core file into Sun Studio to examine the faulty data immediately. Eventually you still need to run the debugged code to the symptom place or near-by. If you encounter barrier 1 or 2, you can run the target program from the command window as in regular mode. About when a little earlier than the program reaches the desired state, you can run attach command from Sun Studio. A process window will pop up to list all the processes of the host machine. You can select the target process to load it into the debugger to begin your debugging work. If your program has barrier 1 issue and it fails at the very early stage. You can utilize a Sun Studio command ss_attach to run with your target program within any runtime environment. When ss_attach executes, it will initiate a connection to Sun Studio and load the program into the Sun Studio debugger for you to debug.

Set the Breakpoint and Watch Wisely

Breakpoint will let you stop the program at the critical place for you to examine the data and program state. Watch will show the critical data values through the program process. Sun Studio supports both features coupled with conditional clause capability. When you encounter barrier 3, 4, and 5 issue(s), you should think and analyze the entire program flow to decide which program sections are critical regions to investigate and what variables are critical to check for the correctness of the program state. With a good analysis and planning, you can set the breakpoints at the most critical places and set the watch for the most critical variable. Conditional breakpoint is very powerful, you can apply it to filter out lots of redundant breakpoint instances. However it is costly to run a large scale program with a conditional breakpoint or just many regular breakpoints. Therefore you should apply as few as possible simple breakpoints to pass through the the safe regions and apply the conditional or more complex breakpoints to go through the dangerous or uncertain areas only.

Make Use of Multi-Session Feature

Sun Studio supports multi-session feature to allow you to debug multiple program instances simultaneously. If your program fails on a specific configuration/platform or some test cases only, it should be easy for you to find a good program instance as a good reference. A comparison between a faulty program instance and good reference program instance is a very efficient way to debug the codes. For example this method is very effective for the people to deal with a parallel program which succeeds in one configuration (say single thread sequentially) and fails in the other configuration (multiple threads). It is also very effective for the people who are not familiar the codes as described in barrier 6. You can load multiple program processes in Sun Studio to debug a faulty process by comparing it with a reference process. Although you may think it is the same as run multiple debugging programs side by side, Multi-Session feature in Sun Studio provides a better environment to make memory space, monitor window space and action control more efficient. The other usage of this feature is a little subtle and may be controversial. You can load multiple process instances of a single program and control these different processes to run at different stages simultaneously. This will provide you a simultaneous view of the program states at different stages. It let you control the critical program steps and examine the critical data in a pipeline way instead of running the program repeatedly. When you encounter barrier 5 or 6, you can experiment this method to debug your complex logic program.

Runtime Memory Access and Leakage Checking

It is quite likely for a large scale C/C++ program to have memory related problem such as a corrupted memory area caused by wild pointer or a routine allocates a chuck of memory and forgets to free it before return. It is not easy to locate this kind of memory problems by traditional debugging method. Fortunately Sun Studio includes a high quality runtime memory checking tool to help you find the memory related bugs. When you encounter barrier 7 problem, you can load the target program into the debugger and turn on memory access check or memory usage check through the Debug Memory menu item in Sun Studio. It will pinpoint the exact violating code in the source code window. However when your program size is big, the memory runtime checking will be quite costly. You should find the small test cases with good coverage for your program to run the memory checking.

The Non-Tool Tip

When you struggle for long hours and still get no progress in going after a mysterious bug, it may be time for you to take a good break, cool down, and think through your debugging process. Start from the problem symptom and double check if you fully understand the symptom and its implication. Next try to explain to yourself the basic program logic flow from the beginning to the symptom occurring place and decide if there is a more efficient approach to go after the problem. It is always a good idea to discuss your puzzling thought or debugging approach with the other person, even you think the person would not really understand what you do. Quite often you might find the logic blind point yourself when you explain to the other people.

Technorati Tag:
Technorati Tag:

Monday May 23, 2005

Java and Native Programming Languages

I still remembered the painful days of developing my first commercial software at Daisy Systems in early 80s. We developed software in PL/1 on the Daisy proprietary workstation called Logician which was sold for around $150,000 per unit. A single I/O task such as printing a text line requires 8-9 routine calls. Even much worse was the debugging capability. We need to punch the assembly codes into a tape and load the tape into ICE machines for debugging. There are around 100 programmers in our building to share 3 ICE machines. So I usually signed up the ICE debugging time around mid-night during intensive debugging period. That was my first experience of programming for a serious thing. You can guess my productivity at that time.

Several years later in another job at a start-up company called IKOS, C/C++ was our primary development language. Comparing with PL/1 and Pascal at Daisy, programming in C/C++ felt like a transition from riding a bike to riding a motorcycle. It ran so fast, but it also hurt big when it fell. We spent very big portion of time in debugging instead of developing new code. If we could deliver IKOS software a little faster with a little better quality, its IPO market cap would be different and my working years might count less.

Therefore when I first learned Java language and see no wild pointer and no memory corruption, my first reaction was to learn it hard and take adavantage of it in my future programming life. I implemented several small projects in Java. The experience was exactly as what I expected, I rarely spent much time in debugging. The codes would function well after it went through compilation stage. However so far I still never have a chance to manage or develop a large scale application software in Java only. If I need to develop a large scale application software in the future, honestly I don't know if I can afford developing the entire software in Java either. The critical productivity issue to me comes down to programming style and programmer's competency more than the language representation. In past decades , people like me learned from the experience to write a program defensively and tools such as Sun Studio Run Time Checking and Rational Purify software help too. But most importantly my decision on programming language would be based on business reason and the technology trend. Some main stream industries such as the manufacturing have invested millions and millions lines of native codes for basic infrastructures and product components. The application software vendors cannot ignore all these existing big investment. They must leverage or salvage these legacy codes. It may be appropriate to wrap the native codes into a service module to fit in a SOA environment in some cases. But it only occurs at the higher functional level to provide the services. People still need to develop lots of plumbing native codes to connect basic code units. Some programs use interpretive languages such as Perl and Python for such plumbing work. But it needs the same native programming language to tie things up when they were not designed to connect originally.

The emerging technology trend of parallel computing from CMT machine to cluster of machines will intensify my view of not affording developing the entire application software in Java for many main stream industries. No doubt Java and JVM can benefit significantly from CMT, the Java threads can be mapped and execute on multiple processor strands or cores on a CMT machine. But many application developers will feel more comfortable to create and manipulate the threads directly. An industry standard such as OpenMP is a good example for software people to write multi-thread program in both C/C++ and FORTRAN every efficiently. The interconnect of near zero delay latency and high bandwidth make an application runs on multiple machines become very appealing now. SOA is also riding on this cluster technology trend. But the interactions among web servers are kind of loosely coupling. The boundary lines of service modules must be at higher level and the transactions cannot be too frequent. A high performance application software cannot be parallelized in this way. Another industry standard MPI (Message Passing Interface) becomes quite popular in native programming world to parallelize a large scale application up to thousands of computing nodes.

If you are patient enough to read through here, you may think I am so biased toward native programming. No quite. I really like Java's simpler and cleaner language semantics. It extracts better programming productivity out from an mediocre programmer with less mangement hassle. Java's extensive API libraries and community software infrastructure and its universal portability will trim an application development cost very significantly. I would still consider Java first when I want to start an application software project. Here I just ask a question if I can afford designing the entire software project in Java in many circumstances, especially for parallel computing applications.



« June 2016