Troubleshooting: How block corruption of Workflow tables can cause Self Service Login not to work

Preface


This troubleshooting case study talks about a scenario in which the Self service login was not working due to a weird block corruption scenario in one of the workflow tables.

The scenario was that the Self Service login page was not working for a few users in a cloned instance. Even though
the forms URL was working e.g. http://dev.mycompany.com/dev60cgi/f60cgi,
the self service URL http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp
was not working at all for a few users. Furthermore, the usual debugging techniques drew a blank.



The
observation was baffling. While there could be many reasons for it,
getting
to the root cause was difficult and involved some guess work related to
what 11i resources are called for when a user login is attemped. This articles outlines the troubleshooting approach.

Applies to:


Oracle Self-Service Web Applications - Version: 11.0 to 12.0.
Oracle Applications 11i, Workflow, Oracle RDBMS 9i/10g

Troubleshooting Details


The Problem:

In a cloned 11i Apps Instance, for a few users, even though the forms URL was
working e.g. http://dev.mycompany.com/dev60cgi/f60cgi, the self service URL
http://dev.mycompany.com/OA_HTML/AppsLocalLogin.jsp was not working.
This was disconcerting since they needed to access some self service functions specifically.

The only visible message was:

"You have encountered an unexpected error. Please contact the System Administrator for assistance."

The Approach:

Firstly, the user's password was changed using FNDCPASS utility:


sandbox:dev> FNDCPASS apps/$apps_pass 0 Y system/$sys_pass USER GVERMA welcome1
Log filename : L2355077.log

Report
filename : O2355077.out

sandbox:dev> more L2355077.log
+---------------------------------------------------------------------------+
Application
Object Library: Version : 11.5.0 - Development

Copyright (c) 1979, 1999, Oracle Corporation. All rights reserved.
module:

+---------------------------------------------------------------------------+

Current system time is 06-JUN-2007 11:21:16

+---------------------------------------------------------------------------+

+---------------------------------------------------------------------------+
Concurrent
request completed successfully
Current system time is 06-JUN-2007 11:21:16

+---------------------------------------------------------------------------+


However, resetting the user's application password did not work.

The DB alert log was checked. Nothing strange showed
up there.

Debugging at Apache and Jserv level was turned on and apache error_log, access_log and jserv.log was monitored while replicating the issue, but that only yielded this in the access_log:


172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET 
/OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%
2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:13:09:34 -0400] "GET
/OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA
HTTP/1.1" 200 6910
172.18.2.149 - - [06/Jun/2007:13:09:41 -0400] "POST /OA_HTML/fndvald.jsp HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:13:09:42 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1" 200 1430
172.18.2.149 - - [06/Jun/2007:13:13:34-0400] "GET /OA_HTML/fndvald.jsp HTTP/1.1" 200 190
172.18.2.149 - - [06/Jun/2007:13:31:09 -0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647


It
was as if the flow of control was being snapped totally after the login
page, without any further activity at the OA Framework
level at all.

Since forms URL was working and the full set of responsibilities was visible, one of the possible resources
which was being accessed could have been the workflow tables for tying up responsbilities with workflow roles.

Based on this, a quick check to Workflow related tables revealed this error:



SQL> select count(1) from WF_LOCAL_USER_ROLES;

select count(1) from WF_LOCAL_USER_ROLES

*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 231, block # 235273)
ORA-01110:
data file 1254:
'/ORACLE/data/datafiles/dev/data11/aol_data17.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option


So it seemed that either Production instance had part of APPLSYS.WF_LOCAL_USER_ROLES in NOLOGGING mode.
The issue was rectified in Production by using


SQL> ALTER TABLE WF_LOCAL_USER_ROLES MODIFY PARTITION <partition_name> LOGGING;


One of the options was to re-import the APPLSYS.WF_LOCAL_USER_ROLES from Production, which would have involved a downtime.

A
better analysis would be to find out the extent of damage done to the table. For this, the DBMS_REPAIR package would be useful.
Reporting not only indicates what is wrong with a block, but also identifies the associated repair directive.

The entire
list of options include:

block corruption list of options.gif:

There
are two levels of block corruption: Hard and Soft. Soft corrupted
blocks can be set to be ignored while reading the table's contents.


sandbox:dev> sqlplus "/ as sysdba"


SQL*Plus: Release 9.2.0.7.0 - Production on Wed Jun 6 14:21:55 2007
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production

SQL> show user
USER is "SYS"


Create the necessary aixiliary tables for identifying corrupt blocks in the table in question:


SQL> exec DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => 'REPAIR_TABLE',

TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => 'USERS');

PL/SQL
procedure successfully completed.

SQL> desc repair_table
Name Null? Type
--------------------------------------    -------- ------------
OBJECT_ID                                 NOT
NULL NUMBER
TABLESPACE_ID                            
NOT NULL NUMBER
RELATIVE_FILE_ID                          NOT
NULL NUMBER
BLOCK_ID                                 
NOT NULL NUMBER
CORRUPT_TYPE                             
NOT NULL NUMBER
SCHEMA_NAME                              
NOT NULL VARCHAR2(30)
OBJECT_NAME                              
NOT NULL VARCHAR2(30)
BASEOBJECT_NAME                                   
VARCHAR2(30)
PARTITION_NAME                                    
VARCHAR2(30)
CORRUPT_DESCRIPTION                                VARCHAR2(2000)

REPAIR_DESCRIPTION                                 VARCHAR2(200)

MARKED_CORRUPT                           
NOT NULL VARCHAR2(10)
CHECK_TIMESTAMP                          
NOT NULL DATE
FIX_TIMESTAMP                                     
DATE
REFORMAT_TIMESTAMP                                 DATE


Check
the specified objects and populates the repair table with information
about corruptions and repair directives.Validation consists of block
checking all blocks in the object. 


SQL> var c number

SQL> exec dbms_repair.check_object(
schema_name=>'APPLSYS',

object_name=>'WF_LOCAL_USER_ROLES',
corrupt_count=>:c);

PL/SQL procedure successfully completed.


See the number of corrupted blocks:


SQL> select :c from dual;

:C
----------
1934


Check the level of corruption -- either Soft of Hard:


SQL> l
1 select distinct SCHEMA_NAME, OBJECT_NAME, CORRUPT_TYPE,
PARTITION_NAME, repair_DESCRIPTION, MARKED_CORRUPT
2* from repair_table
SQL> /

SCHEMA_NAM OBJECT_NAME CORRUPT_TYPE
PARTITION_NAME REPAIR_DESCRIPTION MARKED_COR
---------- ------------------------------ ------------ ------------------------------
------------------------------ ----------
APPLSYS WF_LOCAL_USER_ROLES 6148 HZ_PARTY mark block software corrupt TRUE

SQL>


This procedure fixes the corrupt blocks in specified objects based on information in the repair table that
was previously generated by the check_object procedure.

Prior
to effecting any change to a block, the block is checked to
ensure the block is still corrupt. Corrupt blocks are repaired by
marking the block software corrupt. When a repair is effected, the
associated row in the repair table is updated with a fix timestamp.


SQL> exec DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
SCHEMA_NAME => 'APPLSYS',
OBJECT_NAME=> 'WF_LOCAL_USER_ROLES', 
OBJECT_TYPE => dbms_repair.table_object, 
REPAIR_TABLE_NAME => 'REPAIR_TABLE', 
FIX_COUNT=> :c);
PL/SQL procedure successfully completed.


Enable
the skipping of corrupt blocks during index and table scans of the
specified object. When the object is a table, skip applies to the table
and its indexes.


SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( 
SCHEMA_NAME=>'applsys',

OBJECT_NAME => 'WF_LOCAL_USER_ROLES',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.skip_flag);

PL/SQL procedure successfully completed.


The corrupt blocks get skipped now:


SQL> select count(1) from applsys.WF_LOCAL_USER_ROLES;

COUNT(1)
----------
465748


At this stage, the DB verify utility can also be used to see the corrupted blocks' information:


sandbox:dev> dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf
blocksize=8192
....
....

DBV-00200: Block, dba 969120643, already marked corrupted
DBV-00200: Block, dba 969120644,
already marked corrupted
DBV-00200: Block, dba 969120645, already marked corrupted
DBV-00200: Block, dba 969120646, already
marked corrupted
DBV-00200: Block, dba 969120647, already marked corrupted
DBV-00200: Block, dba 969120648, already marked
corrupted

DBVERIFY - Verification complete

Total Pages Examined : 256000
Total Pages Processed (Data) : 79791

Total Pages Failing (Data) : 0
Total Pages Processed (Index): 32514
Total Pages Failing (Index): 0
Total Pages Processed
(Other): 137041
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 6654
Total Pages
Marked Corrupt : 1934
Total Pages Influx : 0
Highest block SCN : 6045645666279 (1407.2626680807)

sandbox:dev>
dbv file=/ORACLE/data/datafiles/dev/data11/aol_data17.dbf blocksize=8192 2>&1 | grep DBV-00200 | wc -l
1934
sandbox:dev>

The Effect:


As a result of this workaround, the affected users were able to login through the self service applications URL
and also able to see their assigned responsbilities.

Additionally, the apache access log showed the following depicting that application logic was flowing as intended:


172.18.2.149 - - [06/Jun/2007:14:16:37 -0400] "POST /OA_HTML/fndvald.jsp
HTTP/1.1" 302 260
172.18.2.149 - - [06/Jun/2007:14:16:40 -0400] "GET /OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE HTTP/1.1"
200 21284
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/styles/cache/oracle-desktop-custom-2_2_22-en-ie-6-windows.css
HTTP/1.1" 200 33224
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/oajsLibs/oafcore510CU3.js HTTP/1.1"
200 25771
172.18.2.149 - - [06/Jun/2007:14:16:41 -0400] "GET /OA_MEDIA/FNDHOMEBRAND_MED.gif HTTP/1.1" 200 4553
172.18.2.149
- - [06/Jun/2007:14:16:41 -0400] "GET /OA_HTML/cabo/images/warnl.gif HTTP/1.1" 200 977
172.18.2.149 - - [06/Jun/2007:14:16:42
-0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html
HTTP/1.1" 200 98
172.18.2.149 - - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/blank.html HTTP/1.1" 200 98
172.18.2.149
- - [06/Jun/2007:14:16:42 -0400] "GET /OA_HTML/cabo/images/cache/en/bCustomAppsNavLinkgPPb.gif H1.1" 200 538
172.18.2.149
- - [06/Jun/2007:14:16:42 -0400] "GET /OA_MEDIA/fwkhp_folder.gif HTTP/1.1" 200 639
172.18.2.149 - - [06/Jun/2007:14:16:42
-0400] "GET /OA_HTML/cabo/images/cache/en/bCustomLinkgPLN.gif HTTP/1.1" 200 528
172.18.2.149 - - [06/Jun/2007:14:30:11
-0400] "GET /OA_HTML/AppsLocalLogin.jsp HTTP/1.1" 200 6647
172.18.2.149
- - [06/Jun/2007:14:30:19 -0400] "GET
/OA_HTML/OA.jsp?OAFunc=OAHOMEPAGE&akRegionApplicationId=0&navRespId=50958&navRespAppId=660&navSecGrpId=0&transactionid=666561749&oapc=3
HTTP/1.1" 200 23405
172.18.2.149 - - [06/Jun/2007:14:30:19 -0400] "GET /OA_MEDIA/fwkhp_sswafunc.gif HTTP/1.1"
200 659
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400] "GET /OA_HTML/OALogout.jsp?menu=Y HTTP/1.1" 302 270
172.18.2.149
- - [06/Jun/2007:14:31:45 -0400] "GET /oa_servlets/oracle.apps.fnd.sso.AppsLogout HTTP/1.1" 302 308
172.18.2.149
-
- [06/Jun/2007:14:31:45 -0400] "GET
/OA_HTML/AppsLocalLogout.jsp?returnUrl=%2FOA_HTML%2FAppsLocalLogin.jsp%3FcancelUrl%3D%2FOA_HTML%2FAppsLocalLogin.jsp HTTP/1.1" 302 293
172.18.2.149 - - [06/Jun/2007:14:31:45 -0400]
"GET /OA_HTML/AppsLocalLogin.jsp?cancelUrl=/OA_HTML/AppsLocalLogin.jsp&langCode=US&username=GVERMA
HTTP/1.1" 200 6916

The Learning


  • The
    regular debugging mechanisms may not be helpful in troubleshooting the
    issue at hand and one may have to rely on some basic checks
  • Sometimes simple things like block corruption to basic workflow tables can cause critical user functions to not work
  • Oracle
    supplied packages like DBMS_REPAIR may be used to analyze and fix such
    block corruptions, especially when it is acceptable to skip soft
    corrupted blocks

Comments:

Verma, Your posts are great. Do you think because of some bug the corruption happened. we also just noticed that WF table was corrupted in cloned instance. We are running on 9.2.0.5 with 11.5.10.1 If so any bug# would be great. Thanks

Posted by Shyam on April 04, 2008 at 06:23 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today