Wednesday Mar 18, 2009

Participate Sun Studio 12 Update 1 Early Aceess Program

The early access program of Sun Studio 12 Update 1 was rolled out on March 17, 2009. People might wonder why they should put any effort in participating a software tool early access program. Actually there are several substantial benefits for the Solaris or Linux developers to participate in this EA program and I will articulate the reasons later. But first I need to explain what new features are included in Sun Studio 12 Update 1 EA.

Sun Studio 12 update 1 includes many significant features:

    \* C/C++/Fortrant compiler optimization for the new X86 chips from Intel and AMD.
  • \* C/C++/Fortran compiler optimization for the latest UltraSPARC and SPARC64.
  • \* A new stand-alone GUI debugging tool, dbxtool.
  • \* Full OpenMP 3.0 implementation in compilers with performance and debugging tools support.
  • \* A MPI performance analyzer with an easy-to-use GUI environment.
  • \* New IDE with remote development features based on NetBeans 6.5.

I have discussed two important features, OpenMP 3.0 and MPI performance analysis tool, in my previous blog “An Exciting Software Tool Release For The Parallel Application Developers” on Nov 11, 2008. The other new features such as the compiler performance improvement for Sparc and X86 new chips, stand-alone debugging tool and new IDE are very important to many developers too.

The serious software developers such as ISV vendors or enterprise application developers can get a good performance boost without changing any single line of source code just by switching to the new compiler. The compiler optimization will take advantage of the new chip instructions to produce the more efficient binary code.

Many UNIX traditional style developers like to open several command windows to edit, compiler and debug their programs instead of developing their software within an IDE. The stand-alone debugging tool is re-introduced by the requests from many such traditional style developers. The other big benefit of the stand-alone debugging tool is to let the developers login to a remote host and debug their code graphically with the remote light-weight debugging tool through X-window.

The new IDE provides a more productive development environment derived from NetBeans 6.5. Its remote development feature enables the cross-platform development, the software engineers can now develop the Solaris and Linux applications from their Windows laptops directly.

Here I want to explain why you should participate in this Sun Studio 12 Update 1 EA program. Sun Studio 12 Update 1 is a major Sun Studio release with many new features after nearly two years. The EA program allow you to try these new tool features with near final product quality immediately. You should get the performance and productivity boost right away. More importantly the EA program let you participate and help shape the future direction of Sun Studio tools. If the tools in EA program don't meet your requirements, it is a good opportunity for you to give us feedback and we will improve the tools to meet your specific development need. Even if you are a student or casual developer, you can participate the EA program and discuss any related software development issues with our senior engineers through technical forum.

Tuesday Nov 11, 2008

An Exciting Software Tool Realease For The Parallel Application Developers

Sun Studio Express 11/08 Release went live on November 12, 2008. This new release has many important new features, although it is an express release. The new feature list includes updated IDE based on NetBeans 6.5, remote development and debugging, new standalone light weight GUI debugging tool, integrated Dtrace plug-in and the new compiler enhancement to generate the machine code with better run time performance on both Sparc and X86 systems. Any above new feature deserves a good article to describe it. Here I like to discuss two very important tools OpenMP 3.0 and MPI Performance analysis tool for parallel programming in this blog.

Both tools are very important in their own parallel programming domain. OpenMP 3.0 is a major leap from the older version with dynamic tasking in shared memory programming model and the MPI performance analyzer is the indispensable tool for MPI distributed programming. Here I will explain why the serious parallel application developers should feel excited about these two big tools to help design their parallel applications.

OpenMP 3.0 introduces a new feature, dynamic tasking. Instead of parallelizing the code by using the OpenMP parallel loop or parallel regions statically, the parallel application developers can create the parallel tasks in the logic flow dynamically in OpenMP 3.0 and expect these tasks to run concurrently. This new feature gives the developers a much needed programmability. Graph traversal is a practical example to illustrate the programming power of dynamic tasking. Many applications are designed to find an optimal solution for a very complex problem. Usually the problem can be transformed and modeled as a large state space graph and the program algorithm is implemented as finding the optimal graphic node through traversing the graph. The programmer can map the processing of a graph node as an OpenMP task. A task during the process can create as many new tasks as the child nodes of the current node, specifically one new task for each child node. Therefore a complex graph traversal problem can be implemented simply as a task of processing an graph node and creating more new tasks for the corresponding child nodes. The program starts with a single task mapping the root node and the task will generate more and more tasks which eventually cover all the graph nodes.

OpenMP 3.0 also include some other very helpful features such as loop collapse, new environment variables and routines for runtime scheduling etc., Sun Studio compiler implements all these new features too. More importantly Sun Studio compiler keep its tradition as a very high quality performance tool to support the new OpenMP 3.0 with the top run-time performance in the industry.

MPI is the De-facto standard for distributed programming model. Today when people talk about HPC , MPI is the first thing comes to people's mind. Actually MPI programming is not easy for most developers, but the harder problem is to scale up the parallel performance of a MPI application. An application must be thirsty for multiple times performance speed up to adopt the MPI programming model. The failure of scaling up a MPI program's parallel performance is unacceptable. Therefore MPI performance analyzer is the most fundamental tool for MPI application developers besides the MPI library itself.

There are several MPI performance tools ranging from the very expensive one to the open source free one in the market place . Why does Sun Studio offers its own MPI performance analyzer? What is unique about this tool? The quick answer is that Sun Studio MPI performance analyzer is the best of its kind. Here I like to discuss the unique features of Sun Studio MPI performance analyzer in core capability and ease of use interface. When you analyze and tune the performance of a distributed program, you need to analyze both the computation and communication cost and find the right balance between the two. Sun Studio Analyzer is one of the best performance tools in analyzing the computation performance in the industry. It has the most comprehensive instrumentation mechanism to collect all the key runtime performance data and allow the programmers to analyze these data with the various analysis mechanisms from time line, call tree, source code window to data space view. The MPI performance analyzer adds the complementary capability of tracing the MPI messages to analyze the communication cost. Besides the MPI performance tool also collaborating with Sun ClusterTools team (MPI library) adds MPI states to make the measurement of communication cost more precise. Therefore Sun Studio performance analyzer has the best core capability in analyzing both computation and communication cost.

Sun Studio MPI performance analyzer also presents a new and simple user model for the MPI developers. The instrumentation is similar to the Analyzer's original data collection interface for a sequential program. It is easy to run a performance experiment to collect the performance data. The MPI analyzer provides a simple and easy user interface. The main time-line window as shown below provides a two dimensional view of each MPI process's computation time and the messages between the processes. It is quite handy to zoom in and zoom out for either dimension through the corner control console.

Additionally the tool also offers a two dimensional chart window as shown below with both X axis and Y axis configurable with any performance parameter. It is very convenient for the developer to measure the impact of any performance parameter or evaluate the interaction between any pair of performance parameters with the chart analysis window.

Parallel programming remains relative hard for most application developers even with the current advanced software technology. If you are developing a parallel application or plan to develop a parallel application, the most important thing is to use the right tools to help you. I hope this blog incites your curiosity to download and try out Sun Studio Express 11/08 release.

Tuesday Sep 09, 2008

PHP programs to convert an IP Address into the Country Code

For many web sites, the site owners need to know where the visitors come from. We at Sun Developer Tools organization want to know where the software developers downloading Sun Studio software. We needed an utility to convert an IP address into the country code. I searched the Internet and found many commercial offerings. However I only need the core functionality and don't want to buy a commercial product with complex user interface. After studying the public PHP code from, I wrote a complete utility set of PHP programs to convert an IP address into the country code. I think my utility package has the basic functionality needed by many people and decide to share the PHP programs here. You can download the complete set of PHP programs ip2c.tar.gz here. After downloading the file, you should run gunzip and tar commands to un-archive it. The tar file package contains the following programs:

initdb.php: A php program to convert the text IP DB into mysql DB.

check.php: A php verifying program to dump all the stored records in mysql DB.

ip2c.php: A php program to convert IP address into country code from text DB. (slow version)

ip2c_db.php: A php program to convert IP address into country code from mysql DB. (fast version)

Here are the simple steps to run this program:

1. Make sure the included php.ini file in your current directory

-- It is needed to increase PHP program memory size

2. Put your input IP addresses in input.txt (one address one line)

-- See the example input.txt file

3. Run the program as: php ip2c.php [php ip2c_db.php]

4. The output will show both in screen and in output.txt file.

Enjoy using this package and let me know if you encounter any problem. If you think the package is helpful, you may find may other interesting and helpful tools at Sun Studio site.

Monday Feb 26, 2007

I Like My Solaris Laptop

I have a Sony Vaio VGN-T350P laptop. This is a very light weight laptop 1” thin and 3.1 lbs with 10.6” WXGA TFT screen. It has Intel Pentium M 1.0 Ghz processor and 500 Mbyte main memory. I extended the main memory size to 1 Gbyte. Because business traveling is part of my job, I selected this light and small laptop to replace my old and heavy HP Pavilion Zt3000 in summer 2005. It turned out to be a very good decision. This Sony Laptop has accompanied me traveling to so many cities to present and meet with many Sun customers. Starting in 2006, I decided to install Solaris 10 to this laptop to make it a dual boot system. I kept upgrading it with the later Solaris releases. Now it is installed with the early access of Solaris Express, Developer Edition.

Although the initial Solaris installation was a little harder than what I expected, the later release upgrades turned out to be quite simple for me. It usually took me 100 minutes to upgrade to a new Solaris release. The reward from the Solaris installation is definitely worth the effort I put in. Now I spend more time in Solaris than Windows on this laptop. Actually I wrote and edited this blog on my Solaris laptop. The Solaris desktop environment is bundled with many utility and application tools. For example I used the Solaris GIMP image editor to edit and create the desktop wallpaper as shown in the picture below.

No doubt, the latest Solaris express release simplifies the installation process significantly. However some installation problems I encountered are machine dependent and may be too hard for the average users to solve. If you have a similar Sony laptop, you can follow the instructions here to fix the problems. Because my laptop has an odd 1280x768 screen size, the biggest challenge I encountered is to extend the Solaris image to the full screen size. The screen side problem is caused by a defect of Intel video device driver for its 855 graphics chip. The device driver doesn't work correctly for either Solaris or Linux. The video table doesn't contain the right screen resolution setting of 1280x768. Therefore even the right screen configuration in xorg.conf file still cannot make Solaris work to the full screen size. A video driver program called 855resolution is required to patch the video device driver. The program 855resolution was developed by Alain Poirier for Linux initially. Kentaro A. Kurahone ported it from Linux to Free BSD. I searched the internet and followed a couple people's advice (Mario Hakansom is one of them) to port the program from FreeBSD to Solaris and build it with Sun Studio Compiler. You can download the Solaris 855resolution package tar file 855res.tar here. The tar file includes the README file, the binary and the source code. You can follow the instructions to use the binary directly without re-building it.

The second problem is the the external display. I used this laptop to present my talks quite often, It is important to be able to plug-in an external projector to the laptop and display the computer image in the big screen at any time. I had ported a i855crt package from Linux to Solaris to solve this problem for the earlier Solaris version.But I no longer need this display driver now. Somehow the later Solaris Nevada Express releases starting from build 46 fixed the problem. Definitely you also need to have the right xorg.conf setting to make the external display work. My xorg.conf is included in the above Solaris 855resolution package. But you can also download the individual file xorg.conf here.

The third problem of configuring the wireless connection is relatively easy but a little tedious. First thing is to download Intel wireless device driver iwi 0.5 package from OpenSolaris community Intel wireless driver page After you install the Intel wireless device driver, The next step is to run wificonfig command in super user mode. My home wireless connection has WEP encryption, it is more convenient for me to just create a wireless profile for home usage. Here is the command: wificonfig createprofile <ProfileName> essid=<ID> encryption=WEP wepkey1=<password>. After the profile is created. You can run the following two commands in super user mode to complete the wireless configuration: (1) wificonfig -i iwi0 connect Profile_name (2) ifconfig iwi0 dhcp. The above steps should enable the wireless connection on your Solaris laptop. However your laptop may still not be able to access internet. The problem is caused by domain name resolution. First step to resolve DNS problem is to edit the file /etc/nsswitch.conf by adding “dns” after “files” in the line beginning with hosts:. The second step is to find out what is domain name and domain name server IP address of your home internet connection. You can find out the information easily by looking at the network connection property of the other Windows computer. Then you can edit the file /etc/resolv.conf to add the lines of domain dmain_name and nameserver name_server_IP_address. The easier way for the second step is to run Solaris network configuration tool from Start> Administration > Network. It will prompt you to enter root password first. After that, you can select “Wireless connection” from the main panel and click “DNS” tab to enter your DNS domain and name server IP addresses.

I really like my Solaris laptop and it is a very productive environment for me. The Solaris express is bundled with staroffice 8 for the design document writing and slide presentation. It is also bundled with Sun Studio and NetBeans for me to develop my program anywhere I like. Recently I just installed SAMP stack and played with it by writing a simple web application on my laptop. I was shocked by how much I can do with the PHP and MySQL in a very short time. That may be another interesting topic I should talk about next time.

Wednesday Jul 12, 2006

A Brief Explanation Of Race Condition Problem In Parallel Programming


A race condition is a programming fault producing undetermined program state and behavior due to un-synchronized parallel program executions. Race condition is the most worried programming fault by experienced programmers in parallel programming space. However there are many subtle aspects of race condition issues. A race condition problem is often caused by common data accessing, but it can also occur in a sequence of operations which require a protection such as atomic transaction to ensure the overall state integrity. Not every data race case is a programming bug. There is a compromised aspect of allowing race condition in a parallel program for performance reason. Last but not least, there may be a very subtle unexpected programming error underneath a race condition problem symptom.

Here a simple and popular parallel partitioning example is used to explain the above various aspects of race condition issues. Partitioning is one of the popular tasks in HPC application programs dealing with huge number of objects. In the partitioning example shown below, it creates N Pthreads to sort and collect the objects into N group containers according to the object attributes simulataneously. During the collection, each thread counts its group objects and add the group count into the common total sum to check if any object is missed in the collection.


// global declaration

#include <stdio.h>

#include <math.h>

#include <pthread.h>

#include “element.h”

#include “container.h”

#define NGRPS 30

int object_count = 0;

element\* object_array;

container group_array[NGRPS];

int total_count = 0;

void\* collect(void\* arg)


int j;

int group_id = *((int *) arg);

int group_count = 0;

attribute group_attribute = get_group_attribute(group_id);

container group_container = group_array[group_id];

for (j = 0; j < object_count; j++) {

element current_object = object_array[j];

if (current_object.collectFlag == true) continue; // this flag is initialized to false

if (current_object.matchAttribute( group_attribute)) {

current_object.collectFlag = true;





total_count += group_count;

return NULL;


int main(int argc, char\*\* argv)


int i;

pthread_t pids[NTHRS -1];

object_count = process_input_data(argv[1], &object_array);

for (i = 0; i < NTHRS; i++) {

pthread_create(&pids[i], NILL, collect, (void\*) &i);


if (total_count != object_count) {

printf(“ the collected object count %d doesn't match the original object count %d\\n”,total_count, object_count);



Data Race Condition Problem

Data race condition problem is very often to occur in shared memory parallel programming model such as Pthread and OpenMP programs. A data race occurs when multiple threads access a shared memory location with undetermined accessing order and at least one access is to write a new data into the shared memory location.

In the above program example, a data race problem occurs at the second to last statement total_count += group_count in the collect routine. Total_count is a common static variable shared by all the collection threads. When a thread in the process of reading the value of total count and adding its group count value to the total count, another thread may step in and read the old value of total count simultaneously. Right after the first thread writes a new value to the total count, the second thread may overwrite with its new value to the total count and wipe out the computing effort from the first thread.

The effect of a data race problem is quite subtle and hard to locate. In my own experience, I encountered a data race problem almost identical to the problem mentioned above. Because the problem symptom only showed up once in probably hundred runs, I felt puzzled about the program behavior and suspected many things including unstable memory chips. I was over confident about my programming skill and never thought of a race condition in my code until a colleague pointed it out.

Benign Data Race Condition

Not every data race condition is a harmful problem. In the above program example there is another data race problem occurs in the collect routine. The second data race occurs at the checking of current_object's collect_flag and the update of the collect_flag when a thread collects a object into its group container. In this program example, an object attribute uniquely matches one group attribute only. Therefore the collect flag checking seems to be redundant and troublesome by the data race condition. But when you analyze this program further, you will find that the collect flag checking will eliminate the unnecessary computation to match an owned object attribute to the group attribute. The second data race condition in this program case only affects the degree of eliminating the object attribute matching computation. No harmful result will be caused by this data race condition. On the contrary, it is good to keep this kind of data race condition there for performance reason.

In shared memory parallel programming model, the threads must communicate a critical data among one another. In principle all the threads need to read this critical data to be aware of the current overall program state and decide what to proceed further. At least one thread needs to produce new critical data value and update it. This kind of checking and update pattern is quite often in parallel programming, but it causes a data race in nature. If every update action needs to halt and synchronize all the participating threads, it will reduce the parallel computing efficiency significantly. Therefore a data race is a necessary compromise in this case.

General Race Condition

A general race condition problem is caused by the undetermined program sequence of executions to violate the program state integrity and cannot attribute to a single memory location access. The paper “What are Race Conditions?” by Robert H.B. Netzer and Barton P. Miller from University of Wisconsin defines formally a general race condition. This is a much harder problem than a data race condition problem. The challenging nature of general race problem prompts many computer science researchers to study transaction memory approach.

Here we continue with the partition program example to explain what is a general race problem. Let's say after the first phase of collection, the partitioning program needs to fine tune and shuffle some objects from a group container to another group container as shown in the code below.


void shuffle_objects( container\* source, container\* destination, element\* target_objects ) {

// remove target objects from source container


source.remove_array( target_objects );


// Here is a transitory state which may cause general race condition

// add target objects to source container


destination.add_array( target_objects );



The program looks clean and simple enough. However a snap shot of transitory state occurs between the remove and add actions of the two group containers. If there is another thread working on the objects to perform another computing task in parallel, it may find the target objects are homeless at the critical time. The obvious answer seems to encapsulate the entire shuffle_objects method into an atomic derivative and don't allow another other thread or process to interfere in the middle. However this fix may not be a complete solution to meet the application partitioning requirements. For example, if the partitioning program deals with electronic component objects and their child pin objects, the grouping of the components and their pins must be atomic to keep the parent and child objects in a consistent state.

The program code below shows a transitory state which may produce general race condition fault.

partition_shuffle.cpp (continuous)

void shuffle_components(container\* component_source, container\* component_destination, elements\* target_components)


shuffle_objects(component_source, component_destination, target_components);

// Here is another transitory state which may cause general race condition

pins\* target_pins = get_child_pins(target_components);

container\* pin_source = get_pin_container( component_source);

container\* pin_destination = get_pin_container(component_destination);

shuffle_objects(pin_source, pin_destination, target_pins);


Therefore the atomic transaction type requirement may impose on a long sequence of operations.

But when the atomic sequence of operations become complex and long, it misses the original intention of parallel programming . The big issue of general race is that it is very subtle to avoid in the first place and also very hard to fix even you are lucky to discover the problem. Furthermore different from data race, general case can occur in the distributed memory parallel programming such as MPI program as well as shared memory parallel programming.

Understand The Cause Of A Race Condition Problem

Let's come back to collect routine in the beginning partitioning example program. There is a third data race condition problem in this routine. This data race problem is quite subtle and hard to understand the cause without serious investigation. In the collect routine, the statement group_container.add(current_object) has a data race problem.

Almost every programmer will not believe there is a data race problem at this statement by looking into the code. As a matter of fact, this data race problem is caused by another data race problem, the fourth one in such a simple program.

It is easier to explain the fourth data race problem in this routine. The group ID is coming from the routine argument arg which is an pointer passed from the address of loop index i in the main program. The main thread program will advance loop index i and write a new value to the collect argument arg memory location. Therefore it is a data race condition to read this loop index and convert it to group ID in collect routine. Because of this data race problem, two different threads working on collect routine may get the same group ID value and produce the data race problem at the statement group_container.add(current_object).


Parallel programming is a new world for most software developers. The subtlety and complexity of parallel programming is far beyond the sequential programming. The software developers need to use the right tools in this new world. No doubt race detection tool is a critical tool the parallel software developer need to learn and use regularly. Although the current state of parallel developer tools is still not mature enough, there are some early tools available today and more engineering resources are working on it. Current Sun Studio Express June 2006 build features Data Race Detection Tool, you can go to the website to look for more information and get the free download if you are interested in exploring the race condition problems.

Tuesday Jun 27, 2006

What Can Software Developers Learn From Hardware Designers?

Conventionally software development methodology and tools lead the hardware design counterparts. Many hardware designers look for the solution or new tool ideas from the software design territory. For example when the chip transistor count increases, the hardware designers can no longer afford designing and simulating their designs at transistor or gate abstraction level. The hardware people took many language constructs from C/C++ programing and introduced Verilog for RT and Behavioral level design representations. With the increasing computing concurrency and horizontal scalability in latest software development, it may be time for software developers to learn something from the existing hardware design domain. When an engineer faces the concurrent threads or logic devices, one big challenge is to avoid race condition problems. In hardware logic simulation, one main functionality is to assign unknown logic value to a logic node when logic contention occurs at the node. In multi-threaded programming, a shared memory location may suffer logic contention when one thread write a data different from the data read or written by another thread at the memory location. Data race detection tool in the upcoming Sun Studio express release will perform run-time checking to detect a potential data race condition problem in a OpenMP or MT program. Hardware designers also apply static timing analyzer, logic checker and formal proof to detect the critical timing and design logic faults statically. LockLint from Sun Studio is a similar static checking tool in software developer tool space. However there can be more powerful static software developer tools to detect race condition, parallel scalability bottleneck and distributed computing communication issues. No doubt the static software tools will be more complex than the hardware tools. For example, Binary Decision Diagram (BDD) is an important infrastructure to represent the hardware circuitry efficiently for formal verification. This kind of efficient logic representation cannot be done for software programs. Furthermore it is extremely hard to do sufficient data dependency analysis for a C/C++ program with pointers and template. However some software researchers begin to learn from the hardware design formal verification techniques to design new software tools. Debugging is a big challenge for distributed computing software which ranges from MPI applications to SOA world. A traditional software debugging tool become inadequate in dealing with the many concurrent processes interacting among one another. It is very hard to setup a software environment to halt and control many concurrent processes and also have transparency to look into the process states. Design For Testability DFT methodology and the related tools have a long history in hardware design world. It will not be surprising to me that software developer community embrace DFT to meet the debugging need of distributed computing software in the near future.

Thursday Jan 05, 2006

When Artificial Intelligence Became Smarter than Human Brain

I read a book “The Singularity Is Near: When Humans Transcend Biology” by Ray Kurzweil recently. The book mentions the time of artificial intelligence being smarter than human brain will come in just 2-3 decades due to the exponential technology growth rate. Personally I think the prediction is too optimistic. However my real question is whether this thing is good to human society and civilization. The golden grail of computer science is to make the computer create and self-improve the program automatically. But every few people fully know the consequence when the computing machines became smarter than humans. I just ponder on the simple case of doing my job. As a software architect I feel thrilled to have the machine designing, implementing and delivering my dream products automatically. Even better, the machine could improve my architecture plan, the product concept and marketing strategy. This will lead to the very critical issue of judgment and belief. My job requires me to judge the future technology and business trend, come out the product plan and execute the plan with full belief. When a computer threw me a different product plan, how could I convince myself it is definitely a right plan and permit the computer to act on it? If the machine was more powerful than me, it surely could ignore my objection and still proceed. Then the situation of Terminator movie became not so obscure. Therefore humans need to engrave some machine ethic rules on all the future intelligent computers. One of the ethic rules is to obey the human master(s) always. This comes back to the judgment issue again. The inferior intelligent humans make the final call. Therefore the ultimate progress is still bound by the inferior intelligence. This is not good but acceptable. The fundamantal issue is the required absolute perfectness of the machine ethic rules regulation. A trivial defect or violation of the regulation would cascade exponentially to become a singular moment of destroying the entire human society. I also call that “The Singularity Is Near”.

Tuesday Jun 14, 2005

Debug a Large Scale and Complex Solaris Application with Sun Studio

<B><FONT SIZE=4>Debug a Large Scale and Complex </FONT><A HREF=""><FONT SIZE=4>Solaris</FONT></A> <FONT SIZE=4>Application with </FONT><A HREF=""><FONT SIZE=4>Sun Studio</FONT></A></B>

Welcome to the launch of OpenSolaris

Sun Studio is a complete set of software tools with Integrated Development Environment for Solaris application development. Currently Sun Tools organization is planning to let OpenSolaris registered participants have the right to use Sun Studio 10 free. Here I like to discuss some practical issues of debugging an application and how OpenSolaris participants can use free Sun Studio to deal with their Solaris application development. This blog only talks about a small set of features in Sun Studio for debugging. There are much more in the tool set and wait for you to download and try out yourself.


Debugging usually occupies a significant percentage of software development cycle time. The experienced developers tend to design and implement their codes in a good defensive way to reduce the debugging pain. Very rarely the developers can eliminate the debugging need when the program reaches to a modest size or complexity. The good thing about Java programming language is to let developers focus more on design logic issues rather than the subtle programming problems such as memory management. Java also induces a developer to think and design the program in a more object oriented way and make the codes easier to debug. However the ultimate code quality is determined by the programmers' skill and experience rather than the programming language. A bad programmer can produce a poisonous code to pollute all the modules integrated with it and make the entire program very hard to debug. Fortunately in the software development community, more often are the competent programmers who always produce good quality codes with very clear logic flow. When a good programmer encounters a program fault, the first reaction is to visualize and trace the program logic flow in mind to analyze the problem nature. For a simple bug, the cause may be easily located without a diligent debugging effort. When the target program become more complex or developed by a big team, the programmer needs to rely on a good debugging tool with a sound debugging method to investigate and nail down the bug. Here are some practical tips for Sun Studio users to go after a subtle bug in a large scale and complex Solaris application.

Debugging Process

The process of debugging can be divided into three main steps:

  1. 1. Analyze and understand the problem symptom(s) fully

  2. 2. Trace backward the programming logic from the place of symptom(s) to the place of root cause. In some cases, a symptom is caused by the chain reaction of several bugs. Therefore this tracing step may require several iterations to go through every bug cause in the chain.

  3. 3. Find an optimal way to fix the problem, then rebuild and verify the target program. If the newly built program doesn't work in the expected way, go back to step 1 or step 2 to repeat the process.

It may take several debugging cycles to fix a real subtle problem. In general the second tracing logic step is the most challenging and time consuming step of the entire debugging process. Therefore our discussions here will focus on how to leverage Sun Studio to perform the second debugging step.

The Challenge of Debugging a Large Scale and Complex Program

When the target program is large scale and has complex logic, the programmer needs to overcome several big barriers in debugging the program. Some of the most significant problems are listed below:

  1. 1. The program is a part of a very complicated software runtime environment and cannot run independently.

  2. 2. The program fault occurs after a very long run, say hours or even days.

  3. 3. The programming logic distance from the root cause to the symptom may be very far.

  4. 4. The program size may be so big that the debugging tool runs very slowly, especially with breakpoints.

  5. 5. The call stack becomes very deep and hard to follow, especially within deep recursive routine.

  6. 6. The program is implemented by a large team, a single programmer cannot understand all the codes.

  7. 7. The program shows symptom which seems to be caused by corrupted memory or memory leakage.

How To Leverage Sun Studio Debugging Tool

Here are some tips you may consider to apply when you use Sun Studio to debug a Solaris program which has one or many of the above described barrier(s).

Short Cut the Setup Effort

In debugging, the first move is to get to the symptom place to examine all the suspicious data. If you already have a core file. You can just load the core file into Sun Studio to examine the faulty data immediately. Eventually you still need to run the debugged code to the symptom place or near-by. If you encounter barrier 1 or 2, you can run the target program from the command window as in regular mode. About when a little earlier than the program reaches the desired state, you can run attach command from Sun Studio. A process window will pop up to list all the processes of the host machine. You can select the target process to load it into the debugger to begin your debugging work. If your program has barrier 1 issue and it fails at the very early stage. You can utilize a Sun Studio command ss_attach to run with your target program within any runtime environment. When ss_attach executes, it will initiate a connection to Sun Studio and load the program into the Sun Studio debugger for you to debug.

Set the Breakpoint and Watch Wisely

Breakpoint will let you stop the program at the critical place for you to examine the data and program state. Watch will show the critical data values through the program process. Sun Studio supports both features coupled with conditional clause capability. When you encounter barrier 3, 4, and 5 issue(s), you should think and analyze the entire program flow to decide which program sections are critical regions to investigate and what variables are critical to check for the correctness of the program state. With a good analysis and planning, you can set the breakpoints at the most critical places and set the watch for the most critical variable. Conditional breakpoint is very powerful, you can apply it to filter out lots of redundant breakpoint instances. However it is costly to run a large scale program with a conditional breakpoint or just many regular breakpoints. Therefore you should apply as few as possible simple breakpoints to pass through the the safe regions and apply the conditional or more complex breakpoints to go through the dangerous or uncertain areas only.

Make Use of Multi-Session Feature

Sun Studio supports multi-session feature to allow you to debug multiple program instances simultaneously. If your program fails on a specific configuration/platform or some test cases only, it should be easy for you to find a good program instance as a good reference. A comparison between a faulty program instance and good reference program instance is a very efficient way to debug the codes. For example this method is very effective for the people to deal with a parallel program which succeeds in one configuration (say single thread sequentially) and fails in the other configuration (multiple threads). It is also very effective for the people who are not familiar the codes as described in barrier 6. You can load multiple program processes in Sun Studio to debug a faulty process by comparing it with a reference process. Although you may think it is the same as run multiple debugging programs side by side, Multi-Session feature in Sun Studio provides a better environment to make memory space, monitor window space and action control more efficient. The other usage of this feature is a little subtle and may be controversial. You can load multiple process instances of a single program and control these different processes to run at different stages simultaneously. This will provide you a simultaneous view of the program states at different stages. It let you control the critical program steps and examine the critical data in a pipeline way instead of running the program repeatedly. When you encounter barrier 5 or 6, you can experiment this method to debug your complex logic program.

Runtime Memory Access and Leakage Checking

It is quite likely for a large scale C/C++ program to have memory related problem such as a corrupted memory area caused by wild pointer or a routine allocates a chuck of memory and forgets to free it before return. It is not easy to locate this kind of memory problems by traditional debugging method. Fortunately Sun Studio includes a high quality runtime memory checking tool to help you find the memory related bugs. When you encounter barrier 7 problem, you can load the target program into the debugger and turn on memory access check or memory usage check through the Debug Memory menu item in Sun Studio. It will pinpoint the exact violating code in the source code window. However when your program size is big, the memory runtime checking will be quite costly. You should find the small test cases with good coverage for your program to run the memory checking.

The Non-Tool Tip

When you struggle for long hours and still get no progress in going after a mysterious bug, it may be time for you to take a good break, cool down, and think through your debugging process. Start from the problem symptom and double check if you fully understand the symptom and its implication. Next try to explain to yourself the basic program logic flow from the beginning to the symptom occurring place and decide if there is a more efficient approach to go after the problem. It is always a good idea to discuss your puzzling thought or debugging approach with the other person, even you think the person would not really understand what you do. Quite often you might find the logic blind point yourself when you explain to the other people.

Technorati Tag:
Technorati Tag:

Monday May 23, 2005

Java and Native Programming Languages

I still remembered the painful days of developing my first commercial software at Daisy Systems in early 80s. We developed software in PL/1 on the Daisy proprietary workstation called Logician which was sold for around $150,000 per unit. A single I/O task such as printing a text line requires 8-9 routine calls. Even much worse was the debugging capability. We need to punch the assembly codes into a tape and load the tape into ICE machines for debugging. There are around 100 programmers in our building to share 3 ICE machines. So I usually signed up the ICE debugging time around mid-night during intensive debugging period. That was my first experience of programming for a serious thing. You can guess my productivity at that time.

Several years later in another job at a start-up company called IKOS, C/C++ was our primary development language. Comparing with PL/1 and Pascal at Daisy, programming in C/C++ felt like a transition from riding a bike to riding a motorcycle. It ran so fast, but it also hurt big when it fell. We spent very big portion of time in debugging instead of developing new code. If we could deliver IKOS software a little faster with a little better quality, its IPO market cap would be different and my working years might count less.

Therefore when I first learned Java language and see no wild pointer and no memory corruption, my first reaction was to learn it hard and take adavantage of it in my future programming life. I implemented several small projects in Java. The experience was exactly as what I expected, I rarely spent much time in debugging. The codes would function well after it went through compilation stage. However so far I still never have a chance to manage or develop a large scale application software in Java only. If I need to develop a large scale application software in the future, honestly I don't know if I can afford developing the entire software in Java either. The critical productivity issue to me comes down to programming style and programmer's competency more than the language representation. In past decades , people like me learned from the experience to write a program defensively and tools such as Sun Studio Run Time Checking and Rational Purify software help too. But most importantly my decision on programming language would be based on business reason and the technology trend. Some main stream industries such as the manufacturing have invested millions and millions lines of native codes for basic infrastructures and product components. The application software vendors cannot ignore all these existing big investment. They must leverage or salvage these legacy codes. It may be appropriate to wrap the native codes into a service module to fit in a SOA environment in some cases. But it only occurs at the higher functional level to provide the services. People still need to develop lots of plumbing native codes to connect basic code units. Some programs use interpretive languages such as Perl and Python for such plumbing work. But it needs the same native programming language to tie things up when they were not designed to connect originally.

The emerging technology trend of parallel computing from CMT machine to cluster of machines will intensify my view of not affording developing the entire application software in Java for many main stream industries. No doubt Java and JVM can benefit significantly from CMT, the Java threads can be mapped and execute on multiple processor strands or cores on a CMT machine. But many application developers will feel more comfortable to create and manipulate the threads directly. An industry standard such as OpenMP is a good example for software people to write multi-thread program in both C/C++ and FORTRAN every efficiently. The interconnect of near zero delay latency and high bandwidth make an application runs on multiple machines become very appealing now. SOA is also riding on this cluster technology trend. But the interactions among web servers are kind of loosely coupling. The boundary lines of service modules must be at higher level and the transactions cannot be too frequent. A high performance application software cannot be parallelized in this way. Another industry standard MPI (Message Passing Interface) becomes quite popular in native programming world to parallelize a large scale application up to thousands of computing nodes.

If you are patient enough to read through here, you may think I am so biased toward native programming. No quite. I really like Java's simpler and cleaner language semantics. It extracts better programming productivity out from an mediocre programmer with less mangement hassle. Java's extensive API libraries and community software infrastructure and its universal portability will trim an application development cost very significantly. I would still consider Java first when I want to start an application software project. Here I just ask a question if I can afford designing the entire software project in Java in many circumstances, especially for parallel computing applications.



« June 2016