Tuesday Aug 18, 2015

Ouch, this hurts: bug 17325413 - patch BEFORE upgrade!

PatchI really don't want to turn this blog into something making our database look bad. But in this case it is really necessary as it is VERY UNUSUAL that we recommend to patch the database BEFORE upgrade


Just for clarification:

The following topic will affect databases 11.2.0.3.9+, 11.2.0.4.0 and 11.2.0.4.1 only - those (and only those) need to be patched BEFORE upgrade. The topic is fixed in 12.1.0.2 but as it gets introduced with the BEFORE upgrade database version you'll have to apply the fix before upgrade. The inclusion of the fix in 12.1.0.2 means only that the misbehavior won't happen there again. But as it is a meta data dictionary corruption you'll have to apply the fix before as otherwise it will break during or after the upgrade.


First of all, thanks to Ehtiram Hasanov (cleverbridge AG) and Oliver Pyka (http://www.pyka.de/) for highlighting this to me. And sorry for hitting this issue ...

Symptoms:

After upgrading to Oracle Database 12.1.0.2 you'll get one of the below errors when trying to read data: 

  • ORA-07445: exception encountered: core dump [qcsIsColInFro()+358] [SIGSEGV] [ADDR:0x4] [PC:0xCDB4A26] [Address not mapped to object] []
  • ORA-12899 / ORA-607
  • ORA-600 [kdmv_check_row_2:IMCU row has wrong contents]
  • ORA-600 [kddummy_blkchk]
  • ORA-600 [kdBlkCheckError]
  • ORA-600 [klaprs_12]
  • ORA-600 [13013]
  • ORA-600 [17182] 

Analysis:

Basically this happens when you try to drop a column with a DEFAULT value and a NOT NULL definition - it ends up with dropped column data being written to disk leading to block corruptions. This causes problems for generating undo which cannot be applied; a ROLLBACK fails.

If you need more information please look up this MOS Note about
Bug 17325413 - Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption

Versions being affected:

  • These versions require to be patched BEFORE upgrade:
    • Oracle Database 11.2.0.3.9 and above (may happen with earlier PSUs as well)
      Solution: Apply the fix 17325413  on top - see below
    • Oracle Database 11.2.0.4.0 and 11.2.0.4.1 
      Solution: Apply the most recent PSU
  • These versions can get you the issue if you haven't patched BEFORE upgrade:
    • Oracle Database 12.1.0.1
    • Oracle Database 12.1.0.2

Workaround and/or Fix:

The MOS Note about Bug 17325413 - Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption explains the workaround WHEN you hit this issues. 

As a precaution you will have to make sure that you applied one of those fixes BEFORE upgrading to Oracle Database 12.1.0.2. as the fix for Bug 17325413 is included in all those mentioned below (list is taken from above MOS Note as well).

The best way to avoid this is really to apply the patch (or the PSU/BP including the patch) before upgrading.

The issue has been mentioned in "Oracle 11.2.0.4 - Known Issues and Alerts" (MOS Note:1562139.1)  under "Issues Introduced":

Issues introduced

But that does jump into your eye as a thing you need to fix before upgrade.
We'll see if we can get the issue added to the 12c MOS Notes as "Upgrade Issues".

--Mike 

Tuesday Jul 28, 2015

Optimizer Issue in Oracle 12.0.1.2: "Reduce Group By"

Wrong Query Results BugDBAs biggest fears I'd guess are Optimizer Wrong Query Results bugs as usually the optimizer does not write a message into the alert.log saying "Sorry, I was in a bad mood today ..."

The Oracle Database Optimizer is a complex piece - and in Oracle 12c it delivers great performance results. Plus (my personal experience when you know what to do) it is more predictable which I like a lot when changing databases from one to another release. But due to its complexity sometimes we see issues - and sometimes it is necessary to switch off tiny little pieces until a fix is available.

Roy just came across this one - and we believe it's worth to tell you about it. Again, our intention is only to prevent issues when upgrading or migrating to Oracle Database 12.1.0.2.

Symptom:

An outer join query with a bind variable and a group by clause can produce wrong results in some cases.

Analysis:

 If all of the following match, you may be hitting this bug:
 - two or more subquery views are outer-joined on column C1
 - column C1 is specified on select list of top-most query block
 - column C1 is filtered on a bind value

Example:

 create table test1(c1 number(5),c2 varchar2(16));
 insert into test1 values(1,'3');
 commit;

 set NULL NULL
 variable num1 number
 execute :num1 :=1;

 -- Following query retuns wrong result(NULL), this should return 1.

 select V.c1 from
  (SELECT c1 FROM test1 GROUP BY c1) V,
  (SELECT c1 FROM test1 WHERE c2 = '1' GROUP BY c1) V2
 where  V.c1 = :num1
    and V.c1 = V2.c1(+);

Workaround:

alter session set "_optimizer_reduce_groupby_key" = false;

Please don't use the workaround:
alter session set optimizer_features_enable='12.1.0.1';
as this will switch off other good 12.1.0.2 optimizer features working very well.

More information:

See MOS Note:20634449.8 describing:
Bug 20634449 - Wrong results from OUTER JOIN with a bind variable and a GROUP BY clause in 12.1.0.2

As far as I can see there are no interim (one-off/single) patches available right now. 

--Mike

Friday Nov 07, 2014

Sleeping Beauties - Upgrade to 11.2.0.4 can be slow

A customer from the US did contact me past week via LinkedIn and raised a question:

"Is it expected that my patch set upgrade from Oracle 11.2.0.3 to Oracle 11.2.0.4 takes over 3 hours?"

Of course, no - this is not expected

This is the upgrade stats gathered post upgrade with utlu112s.sql:

SQL> @?/rdbms/admin/utlu112s.sql ; .
Oracle Database 11.2 Post-Upgrade Status Tool 10-31-2014 10:05:29
Component Current Version Elapsed Time
Name Status Number HH:MM:SS
Oracle Server  
. VALID 11.2.0.4.0 02:46:19
JServer JAVA Virtual Machine
. VALID 11.2.0.4.0 00:08:34
[..]
Final Actions
. 00:00:00
Total Upgrade Time: 03:06:47

No, this is not really expected. So we tried to nail down the root cause finding out these statements in the upgrade script c1102000.sql are causing the trouble:

194 -- wri$_optstat_histhead_history2.
195 execute immediate
196 q'#create unique index i_wri$_optstat_hh_obj_icol_st on
197 wri$_optstat_histhead_history (obj#, intcol#, savtime, colname)
198 tablespace sysaux #';
199
200 execute immediate
201 q'#create index i_wri$_optstat_hh_st on
202 wri$_optstat_histhead_history (savtime)
203 tablespace sysaux #';
204 end;
205 /

It's index rebuilds on histogram tables. And the customer has a large amount of stats data in his database as the default stats retention is 31 days.

Obviously the index rebuild is not done very efficiently (not done in parallel, no nologging clause). Those things can happen and sometimes this may not cause any issues. But in this case it lead to over 2 hours for just those index rebuilds.  

Luckily my colleague Cindy is an excellent resource for such things - after asking our team I've got the reply that this is tagged with a bug number and code fix already got checked in (under review right now):

bug19855835:
Upgrade from 11.2.0.2 to 11.2.0.4 is slow 

-Mike

PS: Credits go to Tan for bringing this to my attention - and sorry for the inconvenience! 

Thursday Jan 19, 2012

Fundamental Oracle flaw revealed??? Really ...?

This Infoworld article from Jan 17, 2012  Fundamental Oracle flaw revealed did alert Oracle database customers.Infoworld has raised this issue to Oracle before going public with it. Patches are included in the Jan 2012 CPU and PSU. So again, it's strongly recommended to apply the Jan 2012 PSU (or CPU if you are just asking for security fixes) to your environments.

What is the background of this issue?
Everything in an Oracle database is dependent on the SCN (System Change Number). This number is crucial to ensure read consistency. It will always be just incremented and is defined as a large 48-bit integer (281 trillion SCNs). But the SCN can jump as well - especially in cases of distributed transactions. Besides that hard limit there's also a soft limit for the SCN (see the MOS Note for more information).
Distributed Transaction

Hot backup bug
Now there's a backup bug which will increment the SCN to a much higher value once ALTER DATABASE BEGIN BACKUP gets used. We call this putting tablespaces into hot backup mode. Actually I'd assume that most people out there (at least those doing backups on a regular basis) use RMAN - and RMAN does not need to put anything into hot backup mode when creating online backups as the real downside of the hot backup mode is an increased value of log information.
Strong recommendation: Use RMAN! And you may apply patch 12371955: "High SCN growth rate from ALTER DATABASE BEGIN BACKUP in 11g" to your environment.

Combination of backup up and distributed transactions
The people who've detected this issue paint now a large Oracle database infrastructure to the wall - with many databases running distributed transactions - and a misbehaving BEGIN BACKUP routine in combination. This would elevate the SCN over and over again - on all interconnected databases - over time as the SCN will be synched over and over again - and will do huge jumps because of the backup bug.

What's the real risk?
I'm not a security expert - but I've seen many customer environments in the real world. I'd say (and skilled DBAs gotten interviewed by Infoworld and others stated similar opinions) it may be just a small risk in larger environments where many databases are connected together - and CPUs or PSUs got not applied on a regular basis. The PSU/CPU fix will prevent the SCN to be incremented in extensive jumps by several ways.
I'd completly disagree with Infoworld's prediction that databases will crash or abandon - transactions won't be executed anymore and an error will be raised. Yes, this is bad enough - true - but the database(s) will remain open.

What should you do?
Apply the January 2012 PSU or CPU and hot backup fix covered by patch 12371955. But keep in mind

  • Take the PSUs over CPUs as PSUs will contain also important non-dictionary changing fixes whereas CPUs contain security fixes only
  • You can't put a CPU on top of a previous applied PSU
  • Both CPUs and PSU are cummulative 
  • And well, you'll need Extended Support to get acces to PSUs or CPUs for Oracle Database 10.1 and 10.2 - and yes, please don't cry: We've asked you to upgrade a looooooong time ago ;-)

Tuesday Nov 29, 2011

Wrong statistics in AUX_STATS$ might puzzle the optimizer

We do recommend the creation of System Statistics for quite a long time. Since Oracle 9i the optimizer works with a CPU and IO cost based model. And in order to give the optimizer some knowledge about the IO subsystem's performance and throughput - once System Statistics are collected - they'll get stored in AUX_STATS$. For this purpose in the old Oracle 9i days some default values had been defined - and you'll still find those defaults in Oracle Database 11g Release 2 in AUX_STATS$. But these old values don't reflect the performance of modern IO systems. So it might be a good best practice post upgrade to create fresh System Statistics if you haven't done this before. 

You can collect System Statistics with:

exec DBMS_STATS.GATHER_SYSTEM_STATS('start');

and end it later by executing:

exec DBMS_STATS.GATHER_SYSTEM_STATS('stop');


You could also run DBMS_STATS.GATHER_SYSTEM_STATS('interval', interval=>N) instead where N is the number of minutes when statistics gathering is stopped automatically.

Please make sure you'll do this on a real workload period. It won't make sense to gather these values while the database is in an idle state. You should do this ideally for several hours. It doesn't affect performance in a negative way as the values are anyway collected in V$SYSSTAT and V$SESSTAT. And in case you'd like to delete the stats and revert to the old default values you'd simply execute:
exec DBMS_STATS.DELETE_SYSTEM_STATS;

The tricky thing in Oracle Database 11.2 - and that's why I'm actually writing this blog post today - is bug9842771. This leads to wrong values in AUX_STATS$ for SREADTIM and MREADTIM by factor 1000 guiding the optimizer sometimes into the totally wrong directon. The workaround is to overwrite these values manually and divide them by 10000. Use the DBMS_STATS.SET_SYSTEM_STATS procedure. See this MOS Note:9842771.8 for the above bug for some further information. This issue is fixed in Oracle Database 11.2.0.3 and above.

To get some background information about the statistics collected in please read this section in the Oracle Database 11.2 Performance Tuning Guide. And gathering System Statistics might have some implication if you have mixed workloads - and interacts with DB_FILE_MULTIBLOCK_READ_COUNT. For more information please read section 13.4.1.2.


Correction: I had to correct the factor (marked in RED above) from 1000 to 10000. Thanks to Andre Duvekot from the Netherlands for highlighting this to me!!!


About

Mike Dietrich - Oracle Mike Dietrich
Master Product Manager - Database Upgrade & Migrations - Oracle Corp

Based near Munich/Germany and spending plenty of time in airplanes to run either upgrade workshops or work onsite or remotely with reference customers. Acting as interlink between customers/partners and the Upgrade Development.

Follow me on TWITTER

Contact me via LinkedIn or XING

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
2
3
6
7
8
9
10
11
12
13
15
16
22
23
25
26
27
28
29
30
31
     
Today
Oracle related Tech Blogs
Slides Download Center
Visitors since 17-OCT-2011
White Paper and Docs
Workshops
Viewlets and Videos
Workshop Map
This week on my Rega & Pono
Upgrade Reference Papers