Debugging zones using dtrace
By John Clingan-Oracle on Aug 17, 2006
I got an email from a Sun customer having problems with zones asking for ideas as to why the host was performing so slowly with a boatload of zones. That customer did notice svc.configd in each zone consuming quite a bit of CPU time and disabled as many services as he felt safe in disabling, yet the problem persisted.
The bad news is that I missed the email and it lay idle in my rather large inbox for far too long before responding (apologies Mr. Customer). The good news is that the customer figured out the problem leveraging dtrace.
I must say that I've enjoyed cramming wwwaayyy too many zones in wwaayyy to small a space. It was a fun thing for my feeble mind to do and I learned quite a bit along the way. When, from a zones perspective, the Ultra 10 was nearing the Twilight Zone I wanted to figure out what the bottleneck was. I turned to dtrace. What's even better is that this customer leveraged that to help debug his problem.
Due to a (logged) "headless system" bug in Nevada, the Xserver was trying to continually restart. Multiply that by N number of zones and you can probably see the performance impact. Think over 100,000 times a minute - on a low-end system. Unfortunately, the issue was not being captured by prstat, much like Bryan's well-known stock ticker experience. Dtrace makes can make problems such as this apparent quickly.
Update: More accurately, the cde-login service, as opposed to the Xserver, was trying to continually restart.
This customer experience brings up an unfortunate pattern (sysadmin anti-pattern?). This is not the first customer that has run into the Solaris services problem. Many customers new to Solaris don't know what services should be running, so unless it's obvious they leave (mostly unneeded) services running. In this case, it was cde-login (per zone). To me, this experience re-inforces the value of the Secure by Default Opensolaris project, which makes it easy to disable all services (sans ssh) by default and then youcan enable only the ones you need.