X

An Oracle blog about Openomics

  • sun
    May 8, 2009

Portrix debugs live VoIP service using DTrace

Guest Author

Hamburg-based Portrix is providing and hosting the telephony software SmartDialer for a large German call center. SmartDialer is based on a Solaris port of the open-source Asterisk package. The application performs very well on the Sun Fire T2000 chosen for deployment --Solaris outperforms Linux at running Asterisk and Sparc CMT processors are ideally suited for highly multithreaded applications like Asterisk. In production however, SmartDialer experienced some stability issues after a couple of days of uptime, and a very high system time --up to 60%-- was seen on the machine. As a consequence, the quality of the VoIP connection dropped to the point that a reboot of the server was necessary. Oddly, restarting the application alone did not help.

As a member of the Sun Startup Essentials program, Portrix called upon Sun for additional guidance on the issue; this is how our ISV Engineering team got involved in this debugging effort. Unfortunately, there was no adequate load generator for the Portrix VoIP applications --it would have enabled us to reproduce the issue on a separate server-- and all analysis had to be done on the live production system. Quite a scary thought as we know that traditional debugging/profiling tools attach to the running process and momentarily stop it, as part of the procedure. In the best case, that means the call is interrupted for a second or two; in the worst case, the call is dropped and this was unacceptable to the customer.

Enters DTrace. DTrace is a comprehensive dynamic tracing framework for the Solaris operating environment. It provides a powerful infrastructure to
permit administrators and developers to concisely
answer arbitrary questions about the behavior of the operating system
and user programs. The Solaris 10 kernel contains over 30 thousands DTrace probes. Probes are like sensors scattered throughout the operating system --e.g. at useful execution points like the entry and exit of a system call. They have a unique ID and a human-readable name. These light-weight probes can be turned on on-demand to collect data --e.g. the name of the process that enters a given system call.

The key point, in the context of Portrix, is that Solaris 10 comes with DTrace built-in. There is no need to patch the system which could alter the out-of-the-box security and stability of Solaris. To use DTrace, all you do is selectively turn on probes, which is done with a minimal overhead on the running system. In the example below, I am turning on the probe syscall::read:entry --the entry point of the read(2) system call-- using the dtrace(1M) command --you will need specific priviledges or log in as root to issue the command.

# dtrace -n 'syscall::read:entry {}'
dtrace: script 'read.d' matched 1 probe
CPU ID FUNCTION:NAME
0 55925 read:entry
0 55925 read:entry
0 55925 read:entry
\^C

Here, a simple echo is printed everytime the probe is fired, which is the default behavior when nothing is asked from a probe. When you do, the requested action goes in between the { } after the probe name using the D scripting language. Typical actions are to record the name of the process that is firing the probe, how many times it does it, what the parameters are, etc… i.e. you subsequently answer the interrogations you have, to step-by-step root-cause any issue on your system.

In the case of Portrix, we were interested in a time profiling of SmartDialer, similar to what Sun Studio's Performance Analyzer does --by attaching to the proces though, in a more intrusive fashion. We wanted to know who was causing the high system time.

We ran the following command lines to print the hottest functions and their call stacks :

# dtrace -n 'profile-10ms{@a[func(arg0)]=count()} END{trunc(@a,100)}'
# dtrace -n 'profile-10ms{@a[stack()]=count()} END{trunc(@a,100)}'

From their output, it was very visible that the real-time driver ztdummy was the culprit. This explains why a restart of the application did not solve the problem; the application was restarted, but not the driver. This kernel module, needed by Asterisk on Linux, had been ported to Solaris by Portrix as well. It turns out however that this driver is not needed on Solaris. Indeed, Solaris comes with a native real-time timer, of nanosecond resolution, built-in --a little known fact is that Solaris offers a best-in-class real-time platform for years, among general-purpose operating systems. The recommended course to action for Portrix has thus been to rebuilt the application without the ztdummy driver and directly leverage the Solaris real-time capabilities.

In conclusion, using DTrace, we could :

  • quickly root-cause the abnormal behavior of the SmartDialer application on Solaris;
  • perform the analysis on a live production system without patching it and/or installing additional software;
  • perform the analysis on a mission-critical application without impacting, neither its performance nor functionality.
DTrace is really unique today. As a software developer, service engineer or performance specialist, if DTrace is not yet part of your standard toolkit, I recommend you take a closer look at it now.

Join the discussion

Comments ( 3 )
  • Frederic Pariente Tuesday, May 19, 2009
  • mitushi Thursday, August 20, 2009

    I want blogroll link on your site blogs blogs.sun.com

    I have found your site during googleing and I like your blog blogs.sun.com

    I want advertise my site as blogroll, friends site or sponsored links.

    Site introduction

    Vmukti is video streaming and conference Software Company from India and we are also providing live broadcasting.

    My link information as under.

    Title: Video Streaming

    URL: http://www.vmukti.com

    We will provide backlink to you.

    If you want to go ahead let me know.

    Regards,

    Mitushi M

    www.vmukti.com


  • Frederic Pariente Thursday, August 20, 2009

    I'm glad you like the blog, thanks. Unfortunately I limit the blog roll to other Sun ISV Engineering blogs, to keep it manageable and consistent for the readers. No commercial either.

    I encourage you to join the Sun Startup Essentials program in India though, you can get advertizing this way and it may eventually get us to meet again --e.g. a success story of Vmukti on Solaris, Sun Systems or Sun OpenStorage in the future. Visit my colleague Aparna at http://blogs.sun.com/indiapanorama/ next.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.