Portrix debugs live VoIP service using DTrace

Hamburg-based Portrix is providing and hosting the telephony software SmartDialer for a large German call center. SmartDialer is based on a Solaris port of the open-source Asterisk package. The application performs very well on the Sun Fire T2000 chosen for deployment --Solaris outperforms Linux at running Asterisk and Sparc CMT processors are ideally suited for highly multithreaded applications like Asterisk. In production however, SmartDialer experienced some stability issues after a couple of days of uptime, and a very high system time --up to 60%-- was seen on the machine. As a consequence, the quality of the VoIP connection dropped to the point that a reboot of the server was necessary. Oddly, restarting the application alone did not help.

As a member of the Sun Startup Essentials program, Portrix called upon Sun for additional guidance on the issue; this is how our ISV Engineering team got involved in this debugging effort. Unfortunately, there was no adequate load generator for the Portrix VoIP applications --it would have enabled us to reproduce the issue on a separate server-- and all analysis had to be done on the live production system. Quite a scary thought as we know that traditional debugging/profiling tools attach to the running process and momentarily stop it, as part of the procedure. In the best case, that means the call is interrupted for a second or two; in the worst case, the call is dropped and this was unacceptable to the customer.

Enters DTrace. DTrace is a comprehensive dynamic tracing framework for the Solaris operating environment. It provides a powerful infrastructure to permit administrators and developers to concisely answer arbitrary questions about the behavior of the operating system and user programs. The Solaris 10 kernel contains over 30 thousands DTrace probes. Probes are like sensors scattered throughout the operating system --e.g. at useful execution points like the entry and exit of a system call. They have a unique ID and a human-readable name. These light-weight probes can be turned on on-demand to collect data --e.g. the name of the process that enters a given system call.

The key point, in the context of Portrix, is that Solaris 10 comes with DTrace built-in. There is no need to patch the system which could alter the out-of-the-box security and stability of Solaris. To use DTrace, all you do is selectively turn on probes, which is done with a minimal overhead on the running system. In the example below, I am turning on the probe syscall::read:entry --the entry point of the read(2) system call-- using the dtrace(1M) command --you will need specific priviledges or log in as root to issue the command.

# dtrace -n 'syscall::read:entry {}'
dtrace: script 'read.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  55925                       read:entry
  0  55925                       read:entry
  0  55925                       read:entry
\^C

Here, a simple echo is printed everytime the probe is fired, which is the default behavior when nothing is asked from a probe. When you do, the requested action goes in between the { } after the probe name using the D scripting language. Typical actions are to record the name of the process that is firing the probe, how many times it does it, what the parameters are, etc… i.e. you subsequently answer the interrogations you have, to step-by-step root-cause any issue on your system.

In the case of Portrix, we were interested in a time profiling of SmartDialer, similar to what Sun Studio's Performance Analyzer does --by attaching to the proces though, in a more intrusive fashion. We wanted to know who was causing the high system time.

We ran the following command lines to print the hottest functions and their call stacks :

# dtrace -n 'profile-10ms{@a[func(arg0)]=count()} END{trunc(@a,100)}'

# dtrace -n 'profile-10ms{@a[stack()]=count()} END{trunc(@a,100)}'

From their output, it was very visible that the real-time driver ztdummy was the culprit. This explains why a restart of the application did not solve the problem; the application was restarted, but not the driver. This kernel module, needed by Asterisk on Linux, had been ported to Solaris by Portrix as well. It turns out however that this driver is not needed on Solaris. Indeed, Solaris comes with a native real-time timer, of nanosecond resolution, built-in --a little known fact is that Solaris offers a best-in-class real-time platform for years, among general-purpose operating systems. The recommended course to action for Portrix has thus been to rebuilt the application without the ztdummy driver and directly leverage the Solaris real-time capabilities.

In conclusion, using DTrace, we could :

  • quickly root-cause the abnormal behavior of the SmartDialer application on Solaris;
  • perform the analysis on a live production system without patching it and/or installing additional software;
  • perform the analysis on a mission-critical application without impacting, neither its performance nor functionality.
DTrace is really unique today. As a software developer, service engineer or performance specialist, if DTrace is not yet part of your standard toolkit, I recommend you take a closer look at it now.
Comments:

For more details on this project, check out http://blogs.sun.com/performance/entry/dtrace_solution_for_mission_critical !

Posted by Frederic Pariente on May 19, 2009 at 09:25 AM CEST #

I want blogroll link on your site blogs blogs.sun.com

I have found your site during googleing and I like your blog blogs.sun.com

I want advertise my site as blogroll, friends site or sponsored links.

Site introduction

Vmukti is video streaming and conference Software Company from India and we are also providing live broadcasting.

My link information as under.

Title: Video Streaming

URL: http://www.vmukti.com

We will provide backlink to you.

If you want to go ahead let me know.

Regards,

Mitushi M

www.vmukti.com

Posted by mitushi on August 20, 2009 at 06:29 AM CEST #

I'm glad you like the blog, thanks. Unfortunately I limit the blog roll to other Sun ISV Engineering blogs, to keep it manageable and consistent for the readers. No commercial either.

I encourage you to join the Sun Startup Essentials program in India though, you can get advertizing this way and it may eventually get us to meet again --e.g. a success story of Vmukti on Solaris, Sun Systems or Sun OpenStorage in the future. Visit my colleague Aparna at http://blogs.sun.com/indiapanorama/ next.

Posted by Frederic Pariente on August 20, 2009 at 10:06 AM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

How open innovation and technology adoption translates to business value, with stories from our developer support work at Oracle's ISV Engineering.

Subscribe

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
10
11
12
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds