A plea to help support help you!

This week, I am the duty engineer for Kernel calls within the PTS Organisation during Asia-Pacific coverage hours.

Yesterday I received a callout on an escalation that had gone around the world a number of times and been in existence since early July. The guts of the issue (as we had been told) was that the customer was setting the quotas of new usrs and the amount of disc used was being initialised to a rather large non-zero value.

There had been an awful lot of work done to try to determine where the values were initialised both in edquota and in the kernel ioctl() - which is why the Kernel group was involved.

Anyway, the news that came to light yesterday was that the customer was not using edquota. They were using some code that they had written themselves. The actual code itself looked fine. What was wrong was an assumption that they had made. That is that variables allocated on the stack are zero-filled.

They are not.

They contain the data that happened to be on the stack at the time the function was called. So, the bad value that was being written into the quota ioctl() was the result of non-initialised data.

The point of this plea is simple.

Once we had all of the relevant data, this call was solved within hours. If this information had been passed to us when the call had been opened, it would have been closed just as quickly - at the beginning of July, rather than the end of August.

It is crititcal when you log support calls with us to give us the full picture; in this call we had been working under the assumption that the problem was being exhibited using edquota, and a lot of man-weeks were effectively wasted and the customer did not see a solution until almost two months after the call had been placed.

We are currently pushing a process called Sun Global Resolution towards our customer facing folk. This process is based around the Kepner-Traego Analytical Troubleshooting process. Part of the beginning of this process is to really define the expected behaviour and deviation, along with all of the concerns and background information. If you start being asked a whole lot of questions that you don't immediately see the relevance to your problem, hang in there. This information really does help us get to the bottom of problems faster.

Comments:

Post a Comment:
Comments are closed for this entry.
About

* - Solaris and Network Domain, Technical Support Centre


Alan is a kernel and performance engineer based in Australia who tends to have the nasty calls gravitate towards him

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Links
Blogroll

No bookmarks in folder

Sun Folk

No bookmarks in folder

Non-Sun Folk
Non-Sun Folks

No bookmarks in folder