The Blue Bridge of Death

Mary Ann Davidson's recent remarks at the WWW2006 Conference in Scotland comparing civil engineering to software developers have some interesting support implications for our collective community of Apps DBAs.


Is "Software Engineer" An Oxymoron?

One of her provocative points is that people would face the Blue Bridge of Death every day if civil engineers built bridges in the same manner in which we build software. 

This begs the question: why can't we make ERP software as reliable as a bridge? 

The short answer:  A bridge is constructed with concrete, reinforcing steel, and has a
finite number of possible failure points, each of which can be comprehensively tested.  There is no software equivalent, so comprehensive testing is impossible.

An Effectively Infinite Set of Possibilities

Let's try to define the problem with some simplifying assumptions.  There are approximately:
  • 200 E-Business Suite modules
  • 10 E-Business Suite 11i releases (e.g. 11.5.0, 11.5.1, etc)
  • 110 E-Business Suite techstack components
  • Countless patches
  • Dozens of optional configurations, including shared APPL_TOPs, RAC, SSL, DMZs, load-balancers, and so on
  • Dozens of operating systems
The Elephant in the Living Room

There's an effectively infinite set of possible configurations and testing paths.  Development and QA resources are fixed, so, logically, only a very, very small percentage of the possible testing configurations can be tested.

The upshot: your specific E-Business Suite configuration has not been tested by Oracle.  This shouldn't come as a surprise, but we often ignore this ugly reality like the elephant in the living room. 

Customers are sometimes upset to learn that we don't have environments identical to theirs in which we can reproduce their reported issues.  They exclaim, "What do you mean, you didn't test with a Checkpoint firewall, a Cisco SSL accelerator, a Microsoft IIS reverse-proxy, and an F5 BIG-IP load-balancer?"

Implications for Apps DBAs

If the above are givens, then the following corollaries are inescapable:
  1. You are the only customer in the world with your specific configuration.
     
  2. Your problem may be exposed due to your particular configuration but may apply to a number of other configurations.
     
  3. Your problem may be the result of an untested execution path or configuration permutation.
     
  4. If we don't know that it's a problem, we can't fix it.
     
  5. You should report it via a Service Request via Metalink.
Getting Support for Techstack Issues

I know, I know... logging Service Requests via Metalink is right up there with getting a root canal.  Here are some tips:
  1. Our Support Engineers won't have an environment that matches
    yours, so they may have trouble reproducing your issue.  If so, help
    them reproduce it on your system.
     
  2. Those troubles will be worsened if you fail to provide a detailed description of your environment.  It helps to have a canned description of your environment, including a network or physical architecture diagram, listings of your techstack component versions and patches, and so on, which you upload with every Service Request.
     
  3. Escalate urgent requests with a Support Duty Manager, particularly those where your Support Engineer appears to be struggling with a complex configuration.  Simple techstack issues may be solved by entry-level Support Engineers, but highly-idiosyncratic issues need an experienced hand.
     
  4. Remember that Support works on Service Requests but Development only works on bugs.  For intractable problems, request that Development be engaged via a newly-logged bug for your issue.

Software quality assurance is based on statistical sampling, not comprehensive tests.  Quality is improved by large sample sizes, so we rely on your feedback more than you likely suspect.  This is particularly true for Apps technology stack problems. 

So, don't just work around your issues -- log Service Requests to let us know about them and we'll have a fighting chance of fixing them in future releases.

Comments:

Nice comments. Unfortunately, balanced, realistic points such as these aren't 'headline makers' like the "Blue Bridge of Death" stuff.

Posted by Kevin on May 30, 2006 at 01:26 AM PDT #

Thanks for your comments, Kevin.  I think the positive aspect of the Blue Bridge of Death stuff is that it opens the debate about software quality in general.  It certainly caught my eye.Regards,Steven

Posted by Steven Chan on May 30, 2006 at 01:41 AM PDT #

Steve,

Brilliant Post!.
Perfect software can be made theoretically, but with the cost that it is going to incur, there is going to be no one to buy it.
A lot of functional and QA pple lose this sometime.

Rgds,
Jay

Posted by Jay on May 30, 2006 at 01:45 AM PDT #

Yes, true.  There are those defects that we must avoid through design or explicit testing, and then there's the rest of them that arise from the inherent chaos of complex software in the field.  The key is ensuring that the first class of defects is well-defined and understood.Regards,Steven

Posted by Steven Chan on May 30, 2006 at 01:45 AM PDT #

Regarding your comment about providing a "canned description of your environment" this is easy to do now with the new Metalink facility called "My Configs & projects"You see this as a separate tab in Metalink now. If you use this facility Support will see a stamp in the SR text to highlight that this information is available, so you do not need to load the data for each SR. It can also be automated if you wish it to be so, to keep the data that is available to Oracle Support up to dateFor more information see these Metalink notes:Note 365052.1 : Learn More About My Configs & Projects - New Version
Note 356018.1:  My Configs & Projects FAQ

Posted by Mike Shaw on May 31, 2006 at 11:58 PM PDT #

Great tips, Mike.  Thanks for the pointer.Regards,Steven

Posted by Steven Chan on June 01, 2006 at 02:33 AM PDT #

I think the truth probably lies somewhere in the fact that software is one of the youngest of engineering disciplines. I can't imagine that customers, 100 or 200 years from now, will accept the same number of shoddy defects as in the software being delivered today, regardless of which particular combination of operating system, hardware, middleware they are running. (Aside: can you name two identical bridges anywhere in the world? With the same span, climate, geographic conditions, volume of traffic and building materials?)
Remember that civil and mechanical engineering have had (and continue to have) their share of both minor and catastrophic failures, however, they have continually evolved to be able to learn the lessons from these and prevent recurrence.

Posted by Martin Connelly on June 07, 2006 at 12:47 AM PDT #

Yes, that's an interesting point, Martin.The relative maturity of the two disciplines is markedly different.One of the earliest examples of civil engineering is believed to be the Saqqarah stepped pyramid, built by Imhotep in 2550 B.C.  This still stands today.Charles Babbage's Analytic Engine is thought to be the first description of a computer, and an evolved version of that wasn't even built by his son until 1910 A.D.  According to Wikipedia, it was buggy -- it generated a list of multiples of pi... incorrectly.It will be interesting to see how our industry matures within our lifetimes.Regards,Steven

Posted by Steven Chan on June 07, 2006 at 02:00 AM PDT #

Thanks Steven for this insightful posting. I liked the brutal honesty that some of the Supports engineers are entry level, and that issues need to be escalated to Support Duty Manager. A couple of times, we've had to push and shove before we get any sensible solution
from a Oracle Support, after we've been asked to run the same diagnostics over and over again. Its really evident that some of the engineers are struggling. But on a positive note, we always get resolutions to issues we've logged. Is there a way Oracle can stream line its escalation procedure, so that such cases are minimized? I work for a Certified Oracle Advantage Partner, and most of the time we get the verbal thrashing on behalf of Oracle when solutions take long to come through.

Posted by Timothy Agaba on June 28, 2007 at 09:11 PM PDT #

Hi, Timothy,Thanks for your comment.  I sympathize with your situation.  Speaking generically:Our Support teams are always working on better internal training, intra-team communication and mentoring, and other process improvements.  The E-Business Suite is a big and technically-rich space, however, so keeping everyone up to the same level of skill with new stuff is always admittedly a challenge.Some specific advice for your situation:One method of streamlining escalation procedures is to work with an Escalated Service Delivery Manager (ESDM).  An ESDM can monitor all of your SRs and ensure that they don't spin or go off into the woods, and bring together people from multiple teams (including various Development groups) on critical SRs.  This is a specialized Support offering, so it might be worth chatting with your Oracle account manager about whether this would be a good fit for your firm in general, or for specific projects for your customer.Regards,Steven

Posted by Steven Chan on July 02, 2007 at 01:12 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
4
5
6
7
8
9
10
11
12
13
14
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today