Preface
This troubleshooting case study talks about a scenario in which the Self service login was not working due to a weird block corruption scenario in one of the workflow tables.
The scenario was that the Self Service login page was not working
for a few users in a cloned instance. Even though
the forms URL was working e.g.
http://dev.mycompany.com/dev60cgi/f60cgi,
the self service URL
http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp
was not working
at all for a few users. Furthermore, the usual debugging techniques drew a blank.
The
observation was baffling. While there could be many reasons for it,
getting
to the root cause was difficult and involved some guess work related to
what 11i resources are called for when a user login is attemped. This articles outlines the troubleshooting approach.
Applies to:
Oracle Self-Service Web Applications - Version: 11.0 to 12.0.
Oracle Applications 11i, Workflow, Oracle RDBMS 9i/10g
Troubleshooting Details
The Problem:
In a cloned 11i Apps Instance, for a few users, even though the forms URL was
working e.g.
http://dev.mycompany.com/dev60cgi/f60cgi, the self service URL
http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp was not working.
This was disconcerting since they needed to access some self service functions specifically.
The
only visible message was:
"You have encountered an unexpected error. Please contact the System Administrator for assistance."
The Approach:
Firstly, the user's password was changed using FNDCPASS utility:
sandbox:dev> FNDCPASS apps/$apps_pass 0 Y system/$sys_pass USER GVERMA welcome1
Log filename : L2355077.log
Report
filename : O2355077.out
sandbox:dev> more L2355077.log
+---------------------------------------------------------------------------+
Application
Object Library: Version : 11.5.0 - Development
Copyright (c) 1979, 1999, Oracle Corporation. All rights reserved.
module:
+---------------------------------------------------------------------------+
Current system time is 06-JUN-2007 11:21:16
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
Concurrent
request completed successfully
Current system time is 06-JUN-2007 11:21:16
+---------------------------------------------------------------------------+
However, resetting the user's application password did not work.
The DB alert log was checked. Nothing strange showed
up there.
Debugging at Apache and Jserv level was turned on and apache e
rror_log, access_log and
jserv.log was monitored while replicating the issue, but that only yielded this in the
access_log:
172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET
/OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%
2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET
/OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA
HTTP/1.1" 200 6910
172.18.2.149 - - [06/Jun/2007:13:09:41 -0400] "POST /OA_HTML/fndvald.jsp HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:13:09:42 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1" 200 1430
172.18.2.149 - - [06/Jun/2007:13:13:34-0400] "GET /OA_HTML/fndvald.jsp HTTP/1.1" 200 190
172.18.2.149 - - [06/Jun/2007:13:31:09 -0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647
It
was as if the flow of control was being snapped totally after the login
page, without any further activity at the OA Framework
level at all.
Since forms URL was working and the full set of responsibilities was visible, one of the possible resources
which was being accessed could have been the workflow tables for tying up responsbilities with workflow roles.
Based on this, a quick check to Workflow related tables revealed this error:
SQL> select count(1) from WF_LOCAL_USER_ROLES;
select count(1) from WF_LOCAL_USER_ROLES
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 231, block # 235273)
ORA-01110:
data file 1254:
'/ORACLE/data/datafiles/dev/data11/aol_data17.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option
So it seemed that either Production instance had part of APPLSYS.WF_LOCAL_USER_ROLES in NOLOGGING mode.
The issue was rectified in Production by using
SQL> ALTER TABLE WF_LOCAL_USER_ROLES MODIFY PARTITION <partition_name> LOGGING;
One of the options was to re-import the APPLSYS.WF_LOCAL_USER_ROLES from Production, which would have involved a downtime.
A
better analysis would be to find out the extent of damage done to the table. For this, the DBMS_REPAIR package would be useful.
Reporting not only indicates what is wrong with a block, but also identifies the associated repair directive.
The entire
list of options include:

There
are two levels of block corruption: Hard and Soft. Soft corrupted
blocks can be set to be ignored while reading the table's contents.
sandbox:dev> sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.7.0 - Production on Wed Jun 6 14:21:55 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
SQL> show user
USER is "SYS"
Create the necessary aixiliary tables for identifying corrupt blocks in the table in question:
SQL> exec DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => 'REPAIR_TABLE',
TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => 'USERS');
PL/SQL
procedure successfully completed.
SQL> desc repair_table
Name Null? Type
-------------------------------------- -------- ------------
OBJECT_ID NOT
NULL NUMBER
TABLESPACE_ID
NOT NULL NUMBER
RELATIVE_FILE_ID NOT
NULL NUMBER
BLOCK_ID
NOT NULL NUMBER
CORRUPT_TYPE
NOT NULL NUMBER
SCHEMA_NAME
NOT NULL VARCHAR2(30)
OBJECT_NAME
NOT NULL VARCHAR2(30)
BASEOBJECT_NAME
VARCHAR2(30)
PARTITION_NAME
VARCHAR2(30)
CORRUPT_DESCRIPTION VARCHAR2(2000)
REPAIR_DESCRIPTION VARCHAR2(200)
MARKED_CORRUPT
NOT NULL VARCHAR2(10)
CHECK_TIMESTAMP
NOT NULL DATE
FIX_TIMESTAMP
DATE
REFORMAT_TIMESTAMP DATE
Check
the specified objects and populates the repair table with information
about corruptions and repair directives.Validation consists of block
checking all blocks in the object.
SQL> var c number
SQL> exec dbms_repair.check_object(
schema_name=>'APPLSYS',
object_name=>'WF_LOCAL_USER_ROLES',
corrupt_count=>:c);
PL/SQL procedure successfully completed.
See the number of corrupted blocks:
SQL> select :c from dual;
:C
----------
1934
Check the level of corruption -- either Soft of Hard:
SQL> l
1 select distinct SCHEMA_NAME, OBJECT_NAME, CORRUPT_TYPE,
PARTITION_NAME, repair_DESCRIPTION, MARKED_CORRUPT
2* from repair_table
SQL> /
SCHEMA_NAM OBJECT_NAME CORRUPT_TYPE
PARTITION_NAME REPAIR_DESCRIPTION MARKED_COR
---------- ------------------------------ ------------ ------------------------------
------------------------------ ----------
APPLSYS WF_LOCAL_USER_ROLES 6148 HZ_PARTY mark block software corrupt TRUE
SQL>
This procedure fixes the corrupt blocks in specified objects based on information in the repair table that
was previously generated by the check_object procedure.
Prior
to effecting any change to a block, the block is checked to
ensure the block is still corrupt. Corrupt blocks are repaired by
marking the block software corrupt. When a repair is effected, the
associated row in the repair table is updated with a fix timestamp.
SQL> exec DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
SCHEMA_NAME => 'APPLSYS',
OBJECT_NAME=> 'WF_LOCAL_USER_ROLES',
OBJECT_TYPE => dbms_repair.table_object,
REPAIR_TABLE_NAME => 'REPAIR_TABLE',
FIX_COUNT=> :c);
PL/SQL procedure successfully completed.
Enable
the skipping of corrupt blocks during index and table scans of the
specified object. When the object is a table, skip applies to the table
and its indexes.
SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
SCHEMA_NAME=>'applsys',
OBJECT_NAME => 'WF_LOCAL_USER_ROLES',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.skip_flag);
PL/SQL procedure successfully completed.
The corrupt blocks get skipped now:
SQL> select count(1) from applsys.WF_LOCAL_USER_ROLES;
COUNT(1)
----------
465748
At this stage, the DB verify utility can also be used to see the corrupted blocks' information:
sandbox:dev> dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf
blocksize=8192
....
....
DBV-00200: Block, dba 969120643, already marked corrupted
DBV-00200: Block, dba 969120644,
already marked corrupted
DBV-00200: Block, dba 969120645, already marked corrupted
DBV-00200: Block, dba 969120646, already
marked corrupted
DBV-00200: Block, dba 969120647, already marked corrupted
DBV-00200: Block, dba 969120648, already marked
corrupted
DBVERIFY - Verification complete
Total Pages Examined : 256000
Total Pages Processed (Data) : 79791
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 32514
Total Pages Failing (Index): 0
Total Pages Processed
(Other): 137041
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 6654
Total Pages
Marked Corrupt : 1934
Total Pages Influx : 0
Highest block SCN : 6045645666279 (1407.2626680807)
sandbox:dev>
dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf blocksize=8192 2>&1 | grep DBV-00200 | wc -l
1934
sandbox:dev>
The Effect:
As a result of this workaround, the affected users were able to login through the self service applications URL
and also able to see their assigned responsbilities.
Additionally, the apache access log showed the following depicting that application logic was flowing as intended:
172.18.2.149 - - [06/Jun/2007:14:16:37 -0400] "POST /OA_HTML/fndvald.jsp
HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:14:16:40 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1"
200 21284
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/styles/cache/oracle-desktop-custom-2_2_22-en-ie-6-windows.css
HTTP/1.1" 200 33224
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/oajsLibs/oafcore510CU3.js HTTP/1.1"
200 25771
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_MEDIA/FNDHOMEBRAND_MED.gif HTTP/1.1" 200 4553
172.18.2.149
- - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/images/warnl.gif HTTP/1.1" 200 977
172.18.2.149 - - [06/Jun/2007:14:16:42
-0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html
HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149
- - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/cabo/images/cache/en/bCustomAppsNavLinkgPPb.gif H1.1" 200 538
172.18.2.149
- - [06/Jun/2007:14:16:42 -0400] "GET /OA_MEDIA/fwkhp_folder.gif HTTP/1.1" 200 639
172.18.2.149 - - [06/Jun/2007:14:16:42
-0400] "GET /OA_HTML/cabo/images/cache/en/bCustomLinkgPLN.gif HTTP/1.1" 200 528
172.18.2.149 - - [06/Jun/2007:14:30:11
-0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647
172.18.2.149
- - [06/Jun/2007:14:30:19 -0400] "GET
/OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE&akRegionApplicationId=0&navRespId=50958&navRespAppId=660&navSecGrpId=0&transactionid=666561749&oapc=3
HTTP/1.1" 200 23405
172.18.2.149 - - [06/Jun/2007:14:30:19 -0400] "GET /OA_MEDIA/fwkhp_sswafunc.gif HTTP/1.1"
200 659
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /OA_HTML/OALogout.jsp?menu=Y HTTP/1.1" 302 270
172.18.2.149
- - [06/Jun/2007:14:31:45 -0400] "GET /oa_servlets/oracle.apps.fnd.sso.AppsLogout HTTP/1.1" 302 308
172.18.2.149
-
- [06/Jun/2007:14:31:45 -0400] "GET
/OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400]
"GET /OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA
HTTP/1.1" 200 6916
The Learning
- The
regular debugging mechanisms may not be helpful in troubleshooting the
issue at hand and one may have to rely on some basic checks
- Sometimes simple things like block corruption to basic workflow tables can cause critical user functions to not work
- Oracle
supplied packages like DBMS_REPAIR may be used to analyze and fix such
block corruptions, especially when it is acceptable to skip soft
corrupted blocks