« For the want of XML parsing, iPayment was lost; for the want of not being able to take payment, business was lost.. | Main | When adcustomizer.sh may not work.. »

Troubleshooting: How block corruption of Workflow tables can cause Self Service Login not to work

Preface


This troubleshooting case study talks about a scenario in which the Self service login was not working due to a weird block corruption scenario in one of the workflow tables.

The scenario was that the Self Service login page was not working for a few users in a cloned instance. Even though the forms URL was working e.g. http://dev.mycompany.com/dev60cgi/f60cgi, the self service URL http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp was not working at all for a few users. Furthermore, the usual debugging techniques drew a blank.

The observation was baffling. While there could be many reasons for it, getting to the root cause was difficult and involved some guess work related to what 11i resources are called for when a user login is attemped. This articles outlines the troubleshooting approach.

Applies to:

Oracle Self-Service Web Applications - Version: 11.0 to 12.0.
Oracle Applications 11i, Workflow, Oracle RDBMS 9i/10g

Troubleshooting Details

The Problem:

In a cloned 11i Apps Instance, for a few users, even though the forms URL was working e.g. http://dev.mycompany.com/dev60cgi/f60cgi, the self service URL http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp was not working. This was disconcerting since they needed to access some self service functions specifically.

The only visible message was:

"You have encountered an unexpected error. Please contact the System Administrator for assistance."

The Approach:

Firstly, the user's password was changed using FNDCPASS utility:
sandbox:dev> FNDCPASS apps/$apps_pass 0 Y system/$sys_pass USER GVERMA welcome1
Log filename : L2355077.log

Report
filename : O2355077.out

sandbox:dev> more L2355077.log
+---------------------------------------------------------------------------+
Application
Object Library: Version : 11.5.0 - Development

Copyright (c) 1979, 1999, Oracle Corporation. All rights reserved.
module:

+---------------------------------------------------------------------------+

Current system time is 06-JUN-2007 11:21:16

+---------------------------------------------------------------------------+

+---------------------------------------------------------------------------+
Concurrent
request completed successfully
Current system time is 06-JUN-2007 11:21:16

+---------------------------------------------------------------------------+

However, resetting the user's application password did not work.

The DB alert log was checked. Nothing strange showed up there.

Debugging at Apache and Jserv level was turned on and apache error_log, access_log and jserv.log was monitored while replicating the issue, but that only yielded this in the access_log:
172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET 
/OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%
2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET
/OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA
HTTP/1.1" 200 6910
172.18.2.149 - - [06/Jun/2007:13:09:41 -0400] "POST /OA_HTML/fndvald.jsp HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:13:09:42 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1" 200 1430
172.18.2.149 - - [06/Jun/2007:13:13:34-0400] "GET /OA_HTML/fndvald.jsp HTTP/1.1" 200 190
172.18.2.149 - - [06/Jun/2007:13:31:09 -0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647

It was as if the flow of control was being snapped totally after the login page, without any further activity at the OA Framework level at all.

Since forms URL was working and the full set of responsibilities was visible, one of the possible resources which was being accessed could have been the workflow tables for tying up responsbilities with workflow roles.

Based on this, a quick check to Workflow related tables revealed this error:

SQL> select count(1) from WF_LOCAL_USER_ROLES;

select count(1) from WF_LOCAL_USER_ROLES

*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 231, block # 235273)
ORA-01110:
data file 1254:
'/ORACLE/data/datafiles/dev/data11/aol_data17.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option
So it seemed that either Production instance had part of APPLSYS.WF_LOCAL_USER_ROLES in NOLOGGING mode. The issue was rectified in Production by using

SQL> ALTER TABLE WF_LOCAL_USER_ROLES MODIFY PARTITION <partition_name> LOGGING;


One of the options was to re-import the APPLSYS.WF_LOCAL_USER_ROLES from Production, which would have involved a downtime.

A better analysis would be to find out the extent of damage done to the table. For this, the DBMS_REPAIR package would be useful. Reporting not only indicates what is wrong with a block, but also identifies the associated repair directive.

The entire list of options include:

block corruption list of options.gif:

There are two levels of block corruption: Hard and Soft. Soft corrupted blocks can be set to be ignored while reading the table's contents.

sandbox:dev> sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.7.0 - Production on Wed Jun 6 14:21:55 2007
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production

SQL> show user
USER is "SYS"


Create the necessary aixiliary tables for identifying corrupt blocks in the table in question:

SQL> exec DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => 'REPAIR_TABLE',
TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => 'USERS');

PL/SQL procedure successfully completed.

SQL> desc repair_table
Name Null? Type
--------------------------------------    -------- ------------
OBJECT_ID                                 NOT NULL NUMBER
TABLESPACE_ID                             NOT NULL NUMBER
RELATIVE_FILE_ID                          NOT NULL NUMBER
BLOCK_ID                                  NOT NULL NUMBER
CORRUPT_TYPE                              NOT NULL NUMBER
SCHEMA_NAME                               NOT NULL VARCHAR2(30)
OBJECT_NAME                               NOT NULL VARCHAR2(30)
BASEOBJECT_NAME                                    VARCHAR2(30)
PARTITION_NAME                                     VARCHAR2(30)
CORRUPT_DESCRIPTION                                VARCHAR2(2000)
REPAIR_DESCRIPTION                                 VARCHAR2(200)
MARKED_CORRUPT                            NOT NULL VARCHAR2(10)
CHECK_TIMESTAMP                           NOT NULL DATE
FIX_TIMESTAMP                                      DATE
REFORMAT_TIMESTAMP                                 DATE


Check the specified objects and populates the repair table with information about corruptions and repair directives.Validation consists of block checking all blocks in the object. 

SQL> var c number

SQL> exec dbms_repair.check_object(
schema_name=>'APPLSYS',
object_name=>'WF_LOCAL_USER_ROLES',
corrupt_count=>:c);

PL/SQL procedure successfully completed.


See the number of corrupted blocks:

SQL> select :c from dual;

:C
----------
1934


Check the level of corruption -- either Soft of Hard:

SQL> l
1 select distinct SCHEMA_NAME, OBJECT_NAME, CORRUPT_TYPE, PARTITION_NAME, repair_DESCRIPTION, MARKED_CORRUPT
2* from repair_table
SQL> /

SCHEMA_NAM OBJECT_NAME CORRUPT_TYPE PARTITION_NAME REPAIR_DESCRIPTION MARKED_COR
---------- ------------------------------ ------------ ------------------------------ ------------------------------ ----------
APPLSYS WF_LOCAL_USER_ROLES 6148 HZ_PARTY mark block software corrupt TRUE

SQL>


This procedure fixes the corrupt blocks in specified objects based on information in the repair table that was previously generated by the check_object procedure.

Prior to effecting any change to a block, the block is checked to ensure the block is still corrupt. Corrupt blocks are repaired by marking the block software corrupt. When a repair is effected, the associated row in the repair table is updated with a fix timestamp.

SQL> exec DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
SCHEMA_NAME => 'APPLSYS',
OBJECT_NAME=> 'WF_LOCAL_USER_ROLES', 
OBJECT_TYPE => dbms_repair.table_object, 
REPAIR_TABLE_NAME => 'REPAIR_TABLE', 
FIX_COUNT=> :c);
PL/SQL procedure successfully completed.


Enable the skipping of corrupt blocks during index and table scans of the specified object. When the object is a table, skip applies to the table and its indexes.

SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( 
SCHEMA_NAME=>'applsys',
OBJECT_NAME => 'WF_LOCAL_USER_ROLES',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.skip_flag);

PL/SQL procedure successfully completed.


The corrupt blocks get skipped now:

SQL> select count(1) from applsys.WF_LOCAL_USER_ROLES;

COUNT(1)
----------
465748


At this stage, the DB verify utility can also be used to see the corrupted blocks' information:

sandbox:dev> dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf blocksize=8192
....
....

DBV-00200: Block, dba 969120643, already marked corrupted
DBV-00200: Block, dba 969120644, already marked corrupted
DBV-00200: Block, dba 969120645, already marked corrupted
DBV-00200: Block, dba 969120646, already marked corrupted
DBV-00200: Block, dba 969120647, already marked corrupted
DBV-00200: Block, dba 969120648, already marked corrupted

DBVERIFY - Verification complete

Total Pages Examined : 256000
Total Pages Processed (Data) : 79791
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 32514
Total Pages Failing (Index): 0
Total Pages Processed (Other): 137041
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 6654
Total Pages Marked Corrupt : 1934
Total Pages Influx : 0
Highest block SCN : 6045645666279 (1407.2626680807)

sandbox:dev> dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf blocksize=8192 2>&1 | grep DBV-00200 | wc -l
1934
sandbox:dev>

The Effect:

As a result of this workaround, the affected users were able to login through the self service applications URL and also able to see their assigned responsbilities.

Additionally, the apache access log showed the following depicting that application logic was flowing as intended:

172.18.2.149 - - [06/Jun/2007:14:16:37 -0400] "POST /OA_HTML/fndvald.jsp HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:14:16:40 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1" 200 21284
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/styles/cache/oracle-desktop-custom-2_2_22-en-ie-6-windows.css HTTP/1.1" 200 33224
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/oajsLibs/oafcore510CU3.js HTTP/1.1" 200 25771
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_MEDIA/FNDHOMEBRAND_MED.gif HTTP/1.1" 200 4553
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/images/warnl.gif HTTP/1.1" 200 977
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/cabo/images/cache/en/bCustomAppsNavLinkgPPb.gif H1.1" 200 538
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_MEDIA/fwkhp_folder.gif HTTP/1.1" 200 639
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/cabo/images/cache/en/bCustomLinkgPLN.gif HTTP/1.1" 200 528
172.18.2.149 - - [06/Jun/2007:14:30:11 -0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647
172.18.2.149 - - [06/Jun/2007:14:30:19 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE&akRegionApplicationId=0&navRespId=50958&navRespAppId=660&navSecGrpId=0&transactionid=666561749&oapc=3 HTTP/1.1" 200 23405
172.18.2.149 - - [06/Jun/2007:14:30:19 -0400] "GET /OA_MEDIA/fwkhp_sswafunc.gif HTTP/1.1" 200 659
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /OA_HTML/OALogout.jsp?menu=Y HTTP/1.1" 302 270
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /oa_servlets/oracle.apps.fnd.sso.AppsLogout HTTP/1.1" 302 308
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA HTTP/1.1" 200 6916

The Learning

  • The regular debugging mechanisms may not be helpful in troubleshooting the issue at hand and one may have to rely on some basic checks
  • Sometimes simple things like block corruption to basic workflow tables can cause critical user functions to not work
  • Oracle supplied packages like DBMS_REPAIR may be used to analyze and fix such block corruptions, especially when it is acceptable to skip soft corrupted blocks

TrackBack

TrackBack URL for this entry:
http://blogs.oracle.com/mte1521/mt-tb.cgi/2128

Comments (1)

Verma,
Your posts are great.
Do you think because of some bug the corruption happened. we also just noticed that WF table was corrupted in cloned instance.
We are running on 9.2.0.5 with 11.5.10.1

If so any bug# would be great.


Thanks

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About This Entry

This page contains a single entry from the blog posted on June 30, 2007 9:11 PM.

The previous post in this blog was For the want of XML parsing, iPayment was lost; for the want of not being able to take payment, business was lost...

The next post in this blog is When adcustomizer.sh may not work...

Many more can be found on the main index page or by looking through the archives.

Top Tags

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type and Oracle