The Software Vendor's Dilemma

Last year was another big one for us in terms of releases, and there are more new certifications and patches in the pipeline for the E-Business Suite.  This bounty creates challenges for IT managers.  In the last month or so, I've been involved in a number of customer dialogues where I've had to discuss The Software Vendor's Dilemma and its concomitant implications for IT management policies. 


The Software Vendor's Dilemma

For the limited purposes of this discussion, let's make the following assertions:
  • All software has bugs of some kind
  • A given bug will impact some customers more than others
  • Some bugs will be hidden until other bugs are fixed
  • Some bug fixes will introduce new bugs
  • Customers incur direct and indirect costs when applying bug fixes
This gives rise to The Software Vendor's Dilemma:
  • If you don't issue bug fixes quickly, some users will complain
  • If you do issue bug fixes quickly, other users will complain
The IT Manager's Dilemma

Assuming that your software vendors provide patches, that leads naturally and inevitably to the IT Manager's Dilemma:
  • If you don't apply patches, your end-users will complain
  • If you do apply patches, your end-users will complain... about maintenance downtimes and new rounds of User Acceptance Tests
Getting Concrete

So far, I've been speaking in generalities.  Let's get concrete and talk about what we do here in the Oracle Applications Technology Group.  When we fix E-Business Suite technology stack bugs, there are several major ways of getting those fixes into your hands:
  1. A small patch that fixes a single problem for a specific technology stack component
  2. A collection of patches for that particular technology stack component
    (e.g. for AutoConfig)
  3. A collection of patches for a set of interdependent technology stack components
    (e.g. Oracle Applications Framework)
  4. A collection of all patches released for all components in the Applications Technology Group product family
The further up the spectrum you go, the greater the patch's impact.  Accordingly, we subject patches in Category 4 to greater scrutiny than patches in Category 1.

Drawing the Line


Within the limited context of this discussion, one of our responsibilities here in the Applications Technology Group is to ensure that we get fixes to you as quickly as possible.  A given fix may appear in any one -- or all -- of the categories above. 

It's up to you to assess which categories of patches you should apply.  We don't have the right to dictate what patches you must install.  This is why we flag a given patch as recommended, or highly recommended, but never mandatory.

Likewise, we don't specify when you should apply them, how to test them, how to mitigate your operational risks, nor how to convince your management and business stakeholders to agree to the resulting maintenance downtimes and upgrades.  These important business questions are outside of Oracle Development's scope to answer, since their answers depend upon your own operational norms, business priorities, practices, processes, and beliefs.  That's the meat-and-potatoes of an IT manager's core responsibilities.

Some Considerations for Managing Applications Patches

I'm assuming that you already have a business framework for assessing and managing a patch's costs, benefits, and risks.  Here are some additional things from our own Oracle patching processes that you might wish to consider for E-Business Suite technology stack patches:
  • Here in Oracle, all Apps patches go through multiple staging environments, each with their own tests and exit criteria, before being deployed into production.  This may seem blindingly obvious, but I'm repeatedly surprised to hear that some of our large customers don't follow this kind of process to minimize risk.

  • DBAs, system administrators, and IT managers will have differing perspectives on the value and priority of a given patch, and input from all levels is often required to make a good operational decision.  And don't forget your end-users:  ignore their input at your own peril. 

  • All of the quarterly Critical Patch Updates are highly recommended, and we apply all of these -- without fail -- to all of our own Oracle environments

  • Individual "one-of-a-kind" emergency patches receive less testing than ATG Family Pack Rollup patches, so we only apply these to Oracle production environments when absolutely necessary, i.e. when the business benefits clearly outweigh the risks
  • We apply all ATG Family Pack Rollup patches to all critical environments as soon as they're released.  No questions, no exceptions.  These Rollup patches are cumulative and are subjected to the highest level of testing that we can bring to bear on a patch.
  • We use AutoConfig to minimize the overhead and hassle of managing a given patch's impact on configuration files.  And yes, even though we tell you not do so so, sometimes we have to customize those configuration files.  If you must customize a configuration file, follow our guidelines so that AutoConfig preserves your customizations.
  • Not all READMEs are created equal.  Our internal Oracle IT staff are not shy about telling us when a patch's README is ambiguous, misleading, or incomplete.  Their feedback can be rather pointed sometimes.  Likewise, if a patch's README leaves you baffled, log a Service Request via Metalink and get a definitive answer before applying a particular patch.
Planning For It

I don't know much, but I know these things to be true:  nobody likes going to the dentist, or mowing the lawn, or cleaning the toilet, but it's still something that you need to do regularly.  The longer you put it off, the worse it will be.

The same goes for patching your E-Business Suite environments.  Having a clearly-defined patch prioritization and management process and scheduling a few maintenance windows a year to apply Critical Patch Updates and E-Business Suite Family Packs is a lot less painful than having a Severity 1 patching crisis on the night before Thanksgiving. 

If you have tips on navigating the Scylla and Charybdis of E-Business Suite patching, I'd be delighted if you hit the Comment link and shared them with our readers.

Related

Comments:

well it is kind of Dilemma,
we an SR or metalink inform me that i have to apply a patch i feel sad but kind of used ot it now ;-)
i have a 24/7 apps produciton enviromant and the donwtime is big discion to make and even though we do have donwtime, those downtime always being between 3:00 am and should finish before 8:00 am on the weekend day (becuase in this time we deal with the minume number of customer) we dont ignore those customers at that down time but we use hardphone instade of oracle softphon.

fadi
http://oracle-magic.blogspot.com

Posted by Fadi Hasweh on January 10, 2007 at 11:07 PM PST #

Steve ,

You hit the nail right on the head every one faces these issues periodically and so often now that it is very important to find a pattern and come up with standards for "patching" and "testing" strategy , especially ATG. My past experiences while patching , FND or ATG areas , is that on one side when they provide fixes to age old problems but something very basic and very old feature which has not been touched or changed from 11.5.1 maybe suddenly stops working . And this is very true to your statement that when one bug fixes a issue there is a possibility of introducing a new one.

As example ,we applied ATG.H ( pre req for HR.K.1 ) and RUP3 , testing went on for 3 months and finally the patches progressed to production and soon tons of problems/issues appeared out of nowhere ... Of these one of the simple ones was that when a user forgot his password and got the password reset , now was trying to login but messes up again at his first attempt itself then his password reset limit was reached ( even though this is first attempt ) and he had to request the password reset again... This is a very difficult test case to be caught in the testing phase. The user community did test password resets , but who would have thought let me fail the very first attempt and see what happens ? Later it was found its a known bug the time when password is getting reset the user password retry limit was not getting reset the patch was applied and the problem resolved.

Now bigger question at this point is how does one build a standard test case without knowing in detail what areas require detailed and thorough testing and others areas normal testing ? A couple of solutions/suggestions , I have listed a below:

1) Avoid being the first adaptor of any new family packs , RUP or patches of significant size and impact , but some one has to the first. What I have found is that it is very helpful if we wait for a while - between 3 to 6 months , check metalink thoroughly for reported issues and then make a fair judgement when to take the particular set of patches.

But a challenge here is all the known / reported problems are entered into Metalink in a "free format" with out any patterns that it is very diffcult to track. For e.g. if you want to find RUP4 related problems if you type in all possible combinations of RUP4 , RUP.4 , RUP_4 , RUP 4 then maybe you might hit all the issues but if a few issues are reported under ROLLUP 4 and variations then you are doomed completely. I f there is any way to force a format in issues related/reported due to big patches , this would be a big help . Even better would be when a person goes to download the patches , show all the problems related it to it. I know the advanced search is supposed to work but it does not.

2) Educate the end users of all the code changes at high (business use) level, so instead of looking at it from the source code and file names that got changed and making a judgement call of what was affected, this might help the end users in focusing on what needs to be tested and what does not .

3) S hare the test cases that ORACLE internal team has carried out, so if a particular client is using a functionality which was not part of this test case , the clients can add it to theirs and then maybe we can protect ourselves before the patch hits production.

Normally if we include these patches in major releases - that happen quarterly - we have a thorough and generalized test plan , if not we rely on "time testing" of the patch where in the patches sit in TEST and ACCEPTANCE environment for sufficient period of time and then progress to production . But either of these routes do not manage to catch the not so obvious bugs .

Posted by Nandita Saigal on January 11, 2007 at 12:22 AM PST #

A very interesting article. I particularly like the analogy of "you are dammed if you DON'T apply the patches ... and dammed if you DO". I guess it is a bit like servicing your car, if you do it each year it should generally be OK, but if you leave for an extended period you end up paying higher bills anyway and you might suffer a major problem in the meantime.

Posted by Scott Jackson on January 11, 2007 at 01:08 AM PST #

Nandita, Excellent tips -- thank you for taking the time to share these. I'm sympathetic to the challenge of searching in Metalink for something specific that can have multiple names.  This is an area where all search technologies can improve.Your comment about educating end-users touches on a critical point that sometimes overlooked:  your functional users' tests (as part of the User Acceptance Testing stage) will potentially be more effective if they're alerted to the areas affected by a patch.  As for test cases, you might find this article interesting:Automated Testing for the E-Business SuiteRegards,Steven

Posted by Steven Chan on January 11, 2007 at 04:37 AM PST #

Dale,

Quite right --  enhancement requests and new features do pose greater
risk than fixes to code that is predominantly stable and
well-understood.

In general, small "one-of-a-kind" fixes will be flagged as bug fixes. 
Patches that contain multiple fixes should also be flagged or
documented appropriately in their READMEs.

If you log a Service Request (SR) for a specific issue, and a Support Engineer responds by recommending a small patch, this will generally contain only bug fixes for that issue.

However, ATG Rollup patchsets will often contain a mix of bug fixes and
new features, and a careful review of the (often large) READMEs will
describe new and changed functionality.  This is why ATG Rollups are subjected to much more rigorous and comprehensive testing than smaller patchsets.  Again, if a particular patch's README leaves you with any doubt as to the nature of the code contained within, log an SR requesting more details.  This is essential information that you need to qualify the risk associated with a given patch.Regards,Steven 

Posted by Steven Chan on January 11, 2007 at 08:30 AM PST #

Whereas it is surely true that a bug fix will expose another bug or bugs in some cases, I would expect that the proportion of "downstream breakages" from bug fixing would be much less than the amount of bug creation from implementing enhancements. As long as the "bug fix" is just that, I expect the incidence of bugs caused by bug fixes would be quite low. Therefore it makes more sense for managers to patch, as long as patches can be isolated down to "bug fixes". Perhaps this is where oracle can help, by providing focussed "bug fix only" patches?

Posted by Dale Ogilvie on January 11, 2007 at 10:00 AM PST #

Steven,

An excellent, extremely well written and thought provoking article about the importance of ERP packaged software preventative maintenance.

Too many companies who purchase ERP applications expect them to be "static". This is an unrealistic and common theme from the same users who also expect that the software should be able to do everything that they can possibly think of doing...:-)

Great stuff.

John

Posted by John Stouffer on January 13, 2007 at 08:51 AM PST #

Thanks for your comments, John.  I hope this helps start new conversations - or add impetus to existing ones - on the subject of ongoing maintenance. Regards,Steven 

Posted by Steven Chan on January 16, 2007 at 04:48 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
4
5
6
7
8
9
10
11
12
13
14
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today