X

This blog introduces how to diagnose Agile PLM related issues.

  • June 3, 2013

Background task in PMON

Jie Chen
Senior Principal Technical Support Engineer

We all understand that the principle task of PMON is to clean resource after abnormal end of connection process, monitor/recover the background process and register instance to a Lisener. This article will analyze the background task.


Clean Resource of Process Disconnection.


We debug the process with event 10246. As 10246 is not feasible to alter system dynamically, we start instance via pfile and add below parameter to initSID.ora file.

event='10246 trace name context forever, level 10'

Start instance and ensure the Event is working.

[oracle@localhost ~]$ sqlplus sys/oracle as sysdba
idle> startup
ORACLE instance started.
...
Database mounted.
Database opened.
idle> show parameter event
NAME TYPE VALUE
-----------------------------------------------------------------
event string 10246 trace name context forever, level 10
xml_db_events string enable

Then connect from remote client.

C:\Users\JC>whoami
chen-pc\jc
C:\Users\JC>net view /domain
Domain
-------------------------------------
ORADEV
命令成功完成。
C:\Users\JC>sqlplus agile/tartan@lxagile9
SQL> select 1 from dual;
1
----------
1


Because of Dedicated Server connection mode, we use below SQL to get the Dedicated Server Connection process, client process, session id, session serial and session_paddr.

SQL> select p.spid server_ospid, s.process client_ospid, s.sid session_id, s.ser
ial# session_serial, s.paddr session_paddr from v$process p, v$session s where p
.addr=s.paddr and s.sid=(select sid from v$mystat where rownum=1);
SERVER_OSPID CLIENT_OSPID SESSION_ID SESSION_SERIAL SESSION_PADDR
------------------------------------------------------------------------
21374 7640:7644 11 12 3DFA87FC


In above result we see Client_OSPID is shown as 7640:7644 on Windows. If on Unix, we will see a single number as the SQLPLUS process. On Windows, 7460 is the Process ID of SQLPLUS and 7644 is the Thread ID. To confirm this, we can use Process Explorer to verify.


Let's go to server and kill process 21374. This process is the one which service the client's connection.

idle> host
[oracle@localhost ~]$ ps -ef|grep 21374
oracle 21374 1 0 17:47 ? 00:00:00 oracleagile9 (LOCAL=NO)
oracle 21397 21381 0 17:48 pts/1 00:00:00 grep 21374
[oracle@localhost ~]$ kill -9 21374
[oracle@localhost ~]$ exit


On Client, DML will fail.

SQL> select 1 from dual;
select 1 from dual
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 21374
会话 ID: 11 序列号: 12

At this moment, Oracle will issue INTERRUPT exception and hand over to PMON to clean resource. We see the task in trace file.

idle> show parameter background_dump_dest
NAME TYPE VALUE
---------------------------------------------------------
background_dump_dest string /u01/app/oracle/diag/rdbms/agile9/agile9/trace

Trace file /u01/app/oracle/diag/rdbms/agile9/agile9/trace/agile9_pmon_21305.trc
...
*** 2013-03-16 17:49:20.938
marked process 0x3dfa87fc pid=22 serial=5 ospid = 21374 dead
client details:
O/S info: user: JC, term: CHEN-PC, ospid: 7640:7644
machine: ORADEV\CHEN-PC program: sqlplus.exe
application name: SQL*Plus, hash value=3669949024
*** 2013-03-16 17:49:20.938
deleting process 0x3dfa87fc pid=22 serial=5 priority=0
deleting session 0x3d970a50 sid=11 serial=12
deletion of process 3dfa87fc pid=22 seq=5 successful


We see PMON delete the session and clean physical connection, release resource as well.


Monitor Background Process

We simulate a case that we kill the most important process manually, DBWn. Once this Process ends, the instance will stop immediately.

idle> host
[oracle@localhost ~]$ ps -ef | grep ora_ | grep $ORACLE_SID
oracle 21305 1 0 17:45 ? 00:00:00 ora_pmon_agile9
oracle 21307 1 0 17:45 ? 00:00:00 ora_vktm_agile9
...
oracle 21323 1 0 17:45 ? 00:00:00 ora_dbw0_agile9
oracle 21325 1 0 17:45 ? 00:00:00 ora_lgwr_agile9
oracle 21327 1 0 17:45 ? 00:00:00 ora_ckpt_agile9
oracle 21329 1 0 17:45 ? 00:00:00 ora_smon_agile9
...
oracle 21508 1 0 18:01 ? 00:00:00 ora_w000_agile9

[oracle@localhost ~]$ kill -9 21323
[oracle@localhost ~]$ exit


Then we issue DML query and should get ORA-03135. The Process ID 21341 is the connection process itself.

idle> select 1 from dual;
select 1 from dual
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 21341
Session ID: 125 Serial number: 5


Read PMON's trace file, we see PMON successfully detect that DBW0 ends.

Background process DBW0 found dead
*** 2013-03-16 18:02:36.901
Oracle pid = 10
OS pid (from detached process) = 21323
OS pid (from process state) = 21323
dtp = 0x2000d868, proc = 0x3dfa05dc
Dump of memory from 0x2000D868 to 0x2000D8A0
2000D860 00000040 3DFA05DC [@......=]
2000D870 00000000 00000000 30574244 00000200 [........DBW0....]
2000D880 0000534B 0D898AFA B6EA36C0 0000534B [KS.......6..KS..]
2000D890 0D898AFA 00000001 2721298A 00010000 [.........)!'....]
Dump of memory from 0x3DFA05DC to 0x3DFA10B4
3DFA05D0 00000301 [....]
3DFA05E0 00000302 00000000 3DFA05DC 3DFA05EC [...........=...=]
3DFA05F0 3DFA05EC 00000000 00000000 00000000 [...=............]
3DFA0600 00000000 00000000 00000019 3CE4A084 [...............<]
3DFA0610 3D134570 3DB13248 3D133848 00000000 [pE.=H2.=H8.=....]
3DFA0620 3D1338BC 3D1338BC 3D134560 00000601 [.8.=.8.=`E.=....]
3DFA0630 00000000 3D9808D8 3DB13248 0000000A [.......=H2.=....]
3DFA0640 00000000 0000000A 00000001 874B9F64 [............d.K.]
3DFA0650 3A79CCE4 00000000 00000000 00000000 [..y:............]
3DFA0660 00000000 3D778FDC 3D7790C0 00000000 [......w=..w=....]
3DFA0670 00000000 00000000 00000000 00000000 [................]
Repeat 67 times
3DFA10A0 3DFA10A0 3DFA10A0 00000000 3DFA10AC [...=...=.......=]
3DFA10B0 3DFA10AC [...=]
*** 2013-03-16 18:02:36.903
PMON (ospid: 21305): terminating the instance due to error 471
*** 2013-03-16 18:02:46.933
Instance termination failed to kill one or more processes
ksuitm_check: OS PID=21341 is still alive


PMON then close almost all process except 21341, which is the current connection process on server.

We see same error in alert.log file as well.

Sat Mar 16 18:02:36 2013
PMON (ospid: 21305): terminating the instance due to error 471
Termination issued to instance processes. Waiting for the processes to exit
Sat Mar 16 18:02:46 2013
Instance termination failed to kill one or more processes
Instance terminated by PMON, pid = 21305


Above information mentions error 471, it is actually ORA-000471.

ORA-00471: DBWR process terminated with error
Cause:

The database writer process died
Action:

Warm start instance


Register Instance to Listener


Before we start, we need to start Oracle database, stop Listener and delete listener.ora. That means we are using dynamic register for Listener.

LSNRCTL> stop
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully
LSNRCTL>


We get the PMON process in OS level.

idle> select SPID,PROGRAM from v$process where PROGRAM like '%PMON%';
SPID PROGRAM
------------------------ ------------------------------------------------
29172 oracle@localhost.localdomain (PMON)


Now debug with oradebug

idle> oradebug setospid 29172
Oracle pid: 2, Unix process pid: 29172, image: oracle@localhost.localdomain (PMON)
idle> oradebug Event 10257 trace name context forever, level 16
Statement processed.


Again, we start Listener

LSNRCTL> start
Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Log messages written to /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 17-MAR-2013 11:33:07
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
The listener supports no services
The command completed successfully


Issue "status" command several times we see there is still no instance registered to Listener.

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 17-MAR-2013 11:33:07
Uptime 0 days 0 hr. 0 min. 3 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
The listener supports no services
The command completed successfully


After a while, we find there is an instance registered automatically.

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 17-MAR-2013 11:33:07
Uptime 0 days 0 hr. 2 min. 19 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
Services Summary...
Service "agile9" has 1 instance(s).
Instance "agile9", status READY, has 1 handler(s) for this service...
The command completed successfully


End oradebug, let's check PMON's trace log. Attention you will see below two lines in such log. It shows the PMON automatically registered the instance to Listener after 60 seconds (defult value is 60 seconds).


  • kmmgdnu: agile9


  • kmmlrl: instance load 1

idle> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/agile9/agile9/trace/agile9_pmon_29172.trc
idle> oradebug Event 10257 trace name context OFF;
Statement processed.

*** 2013-03-17 11:32:46.658
kmmlrl: update for process drop delta: 52 52 24 26 149
kmmgdnu: agile9
goodness=0, delta=1,
flags=0x4:unblocked/not overloaded, update=0x6:G/D/-
kmmlrl: 24 processes
kmmlrl: instance load 1
kmmlrl: nsgr update returned 0


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.