The Blue Bridge of Death
By Steven Chan-EBS Development-Oracle on May 29, 2006
Mary Ann Davidson's recent remarks at the WWW2006 Conference in Scotland comparing civil engineering to software developers have some interesting support implications for our collective community of Apps DBAs.
One of her provocative points is that people would face the Blue Bridge of Death every day if civil engineers built bridges in the same manner in which we build software.
This begs the question: why can't we make ERP software as reliable as a bridge?
The short answer: A bridge is constructed with concrete, reinforcing steel, and has a
finite number of possible failure points, each of which can be comprehensively tested. There is no software equivalent, so comprehensive testing is impossible.
An Effectively Infinite Set of Possibilities
Let's try to define the problem with some simplifying assumptions. There are approximately:
- 200 E-Business Suite modules
- 10 E-Business Suite 11i releases (e.g. 11.5.0, 11.5.1, etc)
- 110 E-Business Suite techstack components
- Countless patches
- Dozens of optional configurations, including shared APPL_TOPs, RAC, SSL, DMZs, load-balancers, and so on
- Dozens of operating systems
There's an effectively infinite set of possible configurations and testing paths. Development and QA resources are fixed, so, logically, only a very, very small percentage of the possible testing configurations can be tested.
The upshot: your specific E-Business Suite configuration has not been tested by Oracle. This shouldn't come as a surprise, but we often ignore this ugly reality like the elephant in the living room.
Customers are sometimes upset to learn that we don't have environments identical to theirs in which we can reproduce their reported issues. They exclaim, "What do you mean, you didn't test with a Checkpoint firewall, a Cisco SSL accelerator, a Microsoft IIS reverse-proxy, and an F5 BIG-IP load-balancer?"
Implications for Apps DBAs
If the above are givens, then the following corollaries are inescapable:
- You are the only customer in the world with your specific configuration.
- Your problem may be exposed due to your particular configuration but may apply to a number of other configurations.
- Your problem may be the result of an untested execution path or configuration permutation.
- If we don't know that it's a problem, we can't fix it.
- You should report it via a Service Request via Metalink.
I know, I know... logging Service Requests via Metalink is right up there with getting a root canal. Here are some tips:
- Our Support Engineers won't have an environment that matches
yours, so they may have trouble reproducing your issue. If so, help
them reproduce it on your system.
- Those troubles will be worsened if you fail to provide a detailed description of your environment. It helps to have a canned description of your environment, including a network or physical architecture diagram, listings of your techstack component versions and patches, and so on, which you upload with every Service Request.
- Escalate urgent requests with a Support Duty Manager, particularly those where your Support Engineer appears to be struggling with a complex configuration. Simple techstack issues may be solved by entry-level Support Engineers, but highly-idiosyncratic issues need an experienced hand.
- Remember that Support works on Service Requests but Development only works on bugs. For intractable problems, request that Development be engaged via a newly-logged bug for your issue.
Software quality assurance is based on statistical sampling, not comprehensive tests. Quality is improved by large sample sizes, so we rely on your feedback more than you likely suspect. This is particularly true for Apps technology stack problems.
So, don't just work around your issues -- log Service Requests to let us know about them and we'll have a fighting chance of fixing them in future releases.