Debug a Large Scale and Complex Solaris Application with Sun Studio
By swdeveloper on Jun 14, 2005
Welcome to the launch of OpenSolaris
Sun Studio is a complete set of software tools with Integrated Development Environment for Solaris application development. Currently Sun Tools organization is planning to let OpenSolaris registered participants have the right to use Sun Studio 10 free. Here I like to discuss some practical issues of debugging an application and how OpenSolaris participants can use free Sun Studio to deal with their Solaris application development. This blog only talks about a small set of features in Sun Studio for debugging. There are much more in the tool set and wait for you to download and try out yourself.
Debugging usually occupies a significant percentage of software development cycle time. The experienced developers tend to design and implement their codes in a good defensive way to reduce the debugging pain. Very rarely the developers can eliminate the debugging need when the program reaches to a modest size or complexity. The good thing about Java programming language is to let developers focus more on design logic issues rather than the subtle programming problems such as memory management. Java also induces a developer to think and design the program in a more object oriented way and make the codes easier to debug. However the ultimate code quality is determined by the programmers' skill and experience rather than the programming language. A bad programmer can produce a poisonous code to pollute all the modules integrated with it and make the entire program very hard to debug. Fortunately in the software development community, more often are the competent programmers who always produce good quality codes with very clear logic flow. When a good programmer encounters a program fault, the first reaction is to visualize and trace the program logic flow in mind to analyze the problem nature. For a simple bug, the cause may be easily located without a diligent debugging effort. When the target program become more complex or developed by a big team, the programmer needs to rely on a good debugging tool with a sound debugging method to investigate and nail down the bug. Here are some practical tips for Sun Studio users to go after a subtle bug in a large scale and complex Solaris application.
The process of debugging can be divided into three main steps:
1. Analyze and understand the problem symptom(s) fully
2. Trace backward the programming logic from the place of symptom(s) to the place of root cause. In some cases, a symptom is caused by the chain reaction of several bugs. Therefore this tracing step may require several iterations to go through every bug cause in the chain.
3. Find an optimal way to fix the problem, then rebuild and verify the target program. If the newly built program doesn't work in the expected way, go back to step 1 or step 2 to repeat the process.
It may take several debugging cycles to fix a real subtle problem. In general the second tracing logic step is the most challenging and time consuming step of the entire debugging process. Therefore our discussions here will focus on how to leverage Sun Studio to perform the second debugging step.
The Challenge of Debugging a Large Scale and Complex Program
When the target program is large scale and has complex logic, the programmer needs to overcome several big barriers in debugging the program. Some of the most significant problems are listed below:
1. The program is a part of a very complicated software runtime environment and cannot run independently.
2. The program fault occurs after a very long run, say hours or even days.
3. The programming logic distance from the root cause to the symptom may be very far.
4. The program size may be so big that the debugging tool runs very slowly, especially with breakpoints.
5. The call stack becomes very deep and hard to follow, especially within deep recursive routine.
6. The program is implemented by a large team, a single programmer cannot understand all the codes.
7. The program shows symptom which seems to be caused by corrupted memory or memory leakage.
How To Leverage Sun Studio Debugging Tool
Short Cut the Setup Effort
In debugging, the first move is to get to the symptom place to examine all the suspicious data. If you already have a core file. You can just load the core file into Sun Studio to examine the faulty data immediately. Eventually you still need to run the debugged code to the symptom place or near-by. If you encounter barrier 1 or 2, you can run the target program from the command window as in regular mode. About when a little earlier than the program reaches the desired state, you can run attach command from Sun Studio. A process window will pop up to list all the processes of the host machine. You can select the target process to load it into the debugger to begin your debugging work. If your program has barrier 1 issue and it fails at the very early stage. You can utilize a Sun Studio command ss_attach to run with your target program within any runtime environment. When ss_attach executes, it will initiate a connection to Sun Studio and load the program into the Sun Studio debugger for you to debug.
Set the Breakpoint and Watch Wisely
Breakpoint will let you stop the program at the critical place for you to examine the data and program state. Watch will show the critical data values through the program process. Sun Studio supports both features coupled with conditional clause capability. When you encounter barrier 3, 4, and 5 issue(s), you should think and analyze the entire program flow to decide which program sections are critical regions to investigate and what variables are critical to check for the correctness of the program state. With a good analysis and planning, you can set the breakpoints at the most critical places and set the watch for the most critical variable. Conditional breakpoint is very powerful, you can apply it to filter out lots of redundant breakpoint instances. However it is costly to run a large scale program with a conditional breakpoint or just many regular breakpoints. Therefore you should apply as few as possible simple breakpoints to pass through the the safe regions and apply the conditional or more complex breakpoints to go through the dangerous or uncertain areas only.
Make Use of Multi-Session Feature
Sun Studio supports multi-session feature to allow you to debug multiple program instances simultaneously. If your program fails on a specific configuration/platform or some test cases only, it should be easy for you to find a good program instance as a good reference. A comparison between a faulty program instance and good reference program instance is a very efficient way to debug the codes. For example this method is very effective for the people to deal with a parallel program which succeeds in one configuration (say single thread sequentially) and fails in the other configuration (multiple threads). It is also very effective for the people who are not familiar the codes as described in barrier 6. You can load multiple program processes in Sun Studio to debug a faulty process by comparing it with a reference process. Although you may think it is the same as run multiple debugging programs side by side, Multi-Session feature in Sun Studio provides a better environment to make memory space, monitor window space and action control more efficient. The other usage of this feature is a little subtle and may be controversial. You can load multiple process instances of a single program and control these different processes to run at different stages simultaneously. This will provide you a simultaneous view of the program states at different stages. It let you control the critical program steps and examine the critical data in a pipeline way instead of running the program repeatedly. When you encounter barrier 5 or 6, you can experiment this method to debug your complex logic program.
Runtime Memory Access and Leakage Checking
It is quite likely for a large scale C/C++ program to have memory related problem such as a corrupted memory area caused by wild pointer or a routine allocates a chuck of memory and forgets to free it before return. It is not easy to locate this kind of memory problems by traditional debugging method. Fortunately Sun Studio includes a high quality runtime memory checking tool to help you find the memory related bugs. When you encounter barrier 7 problem, you can load the target program into the debugger and turn on memory access check or memory usage check through the Debug Memory menu item in Sun Studio. It will pinpoint the exact violating code in the source code window. However when your program size is big, the memory runtime checking will be quite costly. You should find the small test cases with good coverage for your program to run the memory checking.
The Non-Tool Tip
When you struggle for long hours and still get no progress in going after a mysterious bug, it may be time for you to take a good break, cool down, and think through your debugging process. Start from the problem symptom and double check if you fully understand the symptom and its implication. Next try to explain to yourself the basic program logic flow from the beginning to the symptom occurring place and decide if there is a more efficient approach to go after the problem. It is always a good idea to discuss your puzzling thought or debugging approach with the other person, even you think the person would not really understand what you do. Quite often you might find the logic blind point yourself when you explain to the other people.