By Alan Hargreaves-Oracle on Jun 27, 2011
(copied from my wordpress blog).
So start 95% of the performance calls that I receive. They usually continue something like:
I have gathered some *stat data for you (eg the guds tool from Document 1285485.1), can you please root cause our problem?
So, do you think you could?
Neither can I, based on this my answer inevitably has to be "No".
Given this kind of problem statement, I have no idea about the expectations, the boundary conditions, or even the application. The answer may as well be "Performance problems? Consult your local Doctor for Viagra". It's really not a lot to go on.
So, What kind of problem description is going to allow me to start work on the issue that is being seen? I don't doubt that there really is an issue, it just needs to be pinned down somewhat.
What behavior exactly are you expecting to see?
Be specific and use business metrics. For example "run-time", "response-time" and "throughput".
This helps us define exit criterea.
Now, let's look at the system that is having problems.
How is what you are seeing different? Use the same type of metrics.
The answers to these two questions take us a long way towards being able to work a call.
Even more helpful are answers to questions like
Has this system ever worked to expectation?
If so, when did it start exhibiting this behavior?
Is the problem always present, or does it sometimes work to expectation?
If it sometimes works to expectation, when are you seeing the problem? Is there any discernible pattern?
Is the impact of the problem getting better, worse, or remaining constant?
What kind of differences are there between when the system was performing to expectation and when it is not?
Are there other machines where we could expect to see the same issue (eg similar usage and load), but are not? Again, differences?
Once we start to gather information like this we start to build up a much clearer picture of exactly what we need to investigate, and what we need to achieve so that both you and me agree that the problem has been solved.
Please help get that figure of poorly defined problem statements down from it's current 95% value.