Background task in PMON

We all understand that the principle task of PMON is to clean resource after abnormal end of connection process, monitor/recover the background process and register instance to a Lisener. This article will analyze the background task.

Clean Resource of Process Disconnection.

We debug the process with event 10246. As 10246 is not feasible to alter system dynamically, we start instance via pfile and add below parameter to initSID.ora file.

event='10246 trace name context forever, level 10'

Start instance and ensure the Event is working.

[oracle@localhost ~]$ sqlplus sys/oracle as sysdba

idle> startup
ORACLE instance started.
...
Database mounted.
Database opened.


idle> show parameter event

NAME           TYPE    VALUE
-----------------------------------------------------------------
event          string  10246 trace name context forever, level 10
xml_db_events  string  enable

Then connect from remote client.

C:\Users\JC>whoami
chen-pc\jc

C:\Users\JC>net view /domain
Domain

-------------------------------------
ORADEV
命令成功完成。


C:\Users\JC>sqlplus agile/tartan@lxagile9

SQL> select 1 from dual;
         1
----------
         1

Because of Dedicated Server connection mode, we use below SQL to get the Dedicated Server Connection process, client process, session id, session serial and session_paddr.

SQL> select p.spid server_ospid, s.process client_ospid, s.sid session_id, s.ser
ial# session_serial, s.paddr session_paddr from v$process p, v$session s where p
.addr=s.paddr and s.sid=(select sid from v$mystat where rownum=1);

SERVER_OSPID  CLIENT_OSPID  SESSION_ID  SESSION_SERIAL  SESSION_PADDR
------------------------------------------------------------------------
       21374     7640:7644          11              12       3DFA87FC

In above result we see Client_OSPID is shown as 7640:7644 on Windows. If on Unix, we will see a single number as the SQLPLUS process. On Windows, 7460 is the Process ID of SQLPLUS and 7644 is the Thread ID. To confirm this, we can use Process Explorer to verify.

Let's go to server and kill process 21374. This process is the one which service the client's connection.

idle> host
[oracle@localhost ~]$ ps -ef|grep 21374
oracle   21374     1  0 17:47 ?        00:00:00 oracleagile9 (LOCAL=NO)
oracle   21397 21381  0 17:48 pts/1    00:00:00 grep 21374

[oracle@localhost ~]$ kill -9 21374

[oracle@localhost ~]$ exit

On Client, DML will fail.

SQL> select 1 from dual;
select 1 from dual
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 21374
会话 ID: 11 序列号: 12

At this moment, Oracle will issue INTERRUPT exception and hand over to PMON to clean resource. We see the task in trace file.

idle> show parameter background_dump_dest

NAME                    TYPE   VALUE
---------------------------------------------------------
background_dump_dest  string   /u01/app/oracle/diag/rdbms/agile9/agile9/trace

Trace file /u01/app/oracle/diag/rdbms/agile9/agile9/trace/agile9_pmon_21305.trc
...
*** 2013-03-16 17:49:20.938
marked process 0x3dfa87fc pid=22 serial=5 ospid = 21374 dead
  client details:
    O/S info: user: JC, term: CHEN-PC, ospid: 7640:7644
    machine: ORADEV\CHEN-PC program: sqlplus.exe
    application name: SQL*Plus, hash value=3669949024

*** 2013-03-16 17:49:20.938
deleting process 0x3dfa87fc pid=22 serial=5 priority=0
deleting session 0x3d970a50 sid=11 serial=12
deletion of process 3dfa87fc pid=22 seq=5 successful

We see PMON delete the session and clean physical connection, release resource as well.

Monitor Background Process

We simulate a case that we kill the most important process manually, DBWn. Once this Process ends, the instance will stop immediately.

idle> host
[oracle@localhost ~]$ ps -ef | grep ora_ | grep $ORACLE_SID
oracle   21305     1  0 17:45 ?        00:00:00 ora_pmon_agile9
oracle   21307     1  0 17:45 ?        00:00:00 ora_vktm_agile9
...
oracle   21323     1  0 17:45 ?        00:00:00 ora_dbw0_agile9
oracle   21325     1  0 17:45 ?        00:00:00 ora_lgwr_agile9
oracle   21327     1  0 17:45 ?        00:00:00 ora_ckpt_agile9
oracle   21329     1  0 17:45 ?        00:00:00 ora_smon_agile9
...
oracle   21508     1  0 18:01 ?        00:00:00 ora_w000_agile9


[oracle@localhost ~]$ kill -9 21323
[oracle@localhost ~]$ exit

Then we issue DML query and should get ORA-03135. The Process ID 21341 is the connection process itself.

idle> select 1 from dual;
select 1 from dual
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 21341
Session ID: 125 Serial number: 5

Read PMON's trace file, we see PMON successfully detect that DBW0 ends.

Background process DBW0 found dead

*** 2013-03-16 18:02:36.901
Oracle pid = 10
OS pid (from detached process) = 21323 
OS pid (from process state) = 21323
dtp = 0x2000d868, proc = 0x3dfa05dc
Dump of memory from 0x2000D868 to 0x2000D8A0
2000D860                   00000040 3DFA05DC          [@......=]
2000D870 00000000 00000000 30574244 00000200  [........DBW0....]
2000D880 0000534B 0D898AFA B6EA36C0 0000534B  [KS.......6..KS..]
2000D890 0D898AFA 00000001 2721298A 00010000  [.........)!'....]
Dump of memory from 0x3DFA05DC to 0x3DFA10B4
3DFA05D0                            00000301              [....]
3DFA05E0 00000302 00000000 3DFA05DC 3DFA05EC  [...........=...=]
3DFA05F0 3DFA05EC 00000000 00000000 00000000  [...=............]
3DFA0600 00000000 00000000 00000019 3CE4A084  [...............<]
3DFA0610 3D134570 3DB13248 3D133848 00000000  [pE.=H2.=H8.=....]
3DFA0620 3D1338BC 3D1338BC 3D134560 00000601  [.8.=.8.=`E.=....]
3DFA0630 00000000 3D9808D8 3DB13248 0000000A  [.......=H2.=....]
3DFA0640 00000000 0000000A 00000001 874B9F64  [............d.K.]
3DFA0650 3A79CCE4 00000000 00000000 00000000  [..y:............]
3DFA0660 00000000 3D778FDC 3D7790C0 00000000  [......w=..w=....]
3DFA0670 00000000 00000000 00000000 00000000  [................]
        Repeat 67 times
3DFA10A0 3DFA10A0 3DFA10A0 00000000 3DFA10AC  [...=...=.......=]
3DFA10B0 3DFA10AC                             [...=]            

*** 2013-03-16 18:02:36.903
PMON (ospid: 21305): terminating the instance due to error 471

*** 2013-03-16 18:02:46.933
Instance termination failed to kill one or more processes
ksuitm_check: OS PID=21341 is still alive

PMON then close almost all process except 21341, which is the current connection process on server.

We see same error in alert.log file as well.

Sat Mar 16 18:02:36 2013
PMON (ospid: 21305): terminating the instance due to error 471
Termination issued to instance processes. Waiting for the processes to exit
Sat Mar 16 18:02:46 2013
Instance termination failed to kill one or more processes
Instance terminated by PMON, pid = 21305

Above information mentions error 471, it is actually ORA-000471.

ORA-00471: DBWR process terminated with error
Cause:	 The database writer process died
Action:	 Warm start instance

Register Instance to Listener

Before we start, we need to start Oracle database, stop Listener and delete listener.ora. That means we are using dynamic register for Listener.

LSNRCTL> stop
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully
LSNRCTL>

We get the PMON process in OS level.

idle> select SPID,PROGRAM from v$process where PROGRAM like '%PMON%';

SPID                     PROGRAM
------------------------ ------------------------------------------------
29172                    oracle@localhost.localdomain (PMON)

Now debug with oradebug

idle> oradebug setospid 29172
Oracle pid: 2, Unix process pid: 29172, image: oracle@localhost.localdomain (PMON)

idle>  oradebug Event 10257 trace name context forever, level 16
Statement processed.

Again, we start Listener

LSNRCTL> start
Starting /u01/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Log messages written to /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                17-MAR-2013 11:33:07
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
The listener supports no services
The command completed successfully

Issue "status" command several times we see there is still no instance registered to Listener.

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                17-MAR-2013 11:33:07
Uptime                    0 days 0 hr. 0 min. 3 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
The listener supports no services
The command completed successfully

After a while, we find there is an instance registered automatically.

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                17-MAR-2013 11:33:07
Uptime                    0 days 0 hr. 2 min. 19 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /u01/app/oracle/diag/tnslsnr/localhost/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
Services Summary...
Service "agile9" has 1 instance(s).
  Instance "agile9", status READY, has 1 handler(s) for this service...
The command completed successfully

End oradebug, let's check PMON's trace log. Attention you will see below two lines in such log. It shows the PMON automatically registered the instance to Listener after 60 seconds (defult value is 60 seconds).

  • kmmgdnu: agile9
  • kmmlrl: instance load 1

idle> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/agile9/agile9/trace/agile9_pmon_29172.trc

idle> oradebug Event 10257 trace name context OFF;
Statement processed.

*** 2013-03-17 11:32:46.658
kmmlrl: update for process drop delta: 52 52 24 26 149
kmmgdnu: agile9
         goodness=0, delta=1,
         flags=0x4:unblocked/not overloaded, update=0x6:G/D/-
kmmlrl: 24 processes
kmmlrl: instance load 1
kmmlrl: nsgr update returned 0


Comments:

Post a Comment:
Comments are closed for this entry.
About


Jie Chen is the L3 memeber in Oracle Agile Support.
This blog focuses on the Maintenance, Diagnosis and Tuning related technical skills.
The technology covers Java/JavaEE, Weblogic, Security, Clustering, and Database of course.

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today