Memory leak in scdpmd

The scdpmd (Sun Cluster disk path monitor daemon) have a memory leak when the reboot_on_path_failure flag is enabled. This is a known issue and reported in bug 6563949 which will be fixed soon. The workaround is to use the default of reboot_on_path_failure which is disabled. Only Sun Cluster 3.2 is affected because this is a new feature of Sun Cluster 3.2.
Details about Administering Disk-Path Monitoring.
Update 8.Feb.2008: The bug 6563949 is now fixed in the patches
126106-04 Sun Cluster 3.2: CORE patch for Solaris 10
126107-04 Sun Cluster 3.2: CORE patch for Solaris 10_x86
126105-04 Sun Cluster 3.2: CORE patch for Solaris 9
Update 30.Jun.2008: The bug 6682663 can prevent the reboot. This is fixed in the revision -15 of the already mentioned Sun Cluster 3.2 CORE patches.



How to identify if reboot_on_path_failure is enabled?

t2000d# scdpm -p all:all
t2000e:reboot_on_path_failure enabled
t2000e:/dev/did/rdsk/d1 Ok
t2000e:/dev/did/rdsk/d2 Ok
t2000e:/dev/did/rdsk/d4 Ok
t2000e:/dev/did/rdsk/d5 Ok
t2000e:/dev/did/rdsk/d6 Ok
t2000e:/dev/did/rdsk/d7 Ok
t2000d:reboot_on_path_failure enabled
t2000d:/dev/did/rdsk/d10 Ok
t2000d:/dev/did/rdsk/d11 Ok
t2000d:/dev/did/rdsk/d13 Ok
t2000d:/dev/did/rdsk/d14 Ok
t2000d:/dev/did/rdsk/d6 Ok
t2000d:/dev/did/rdsk/d7 Ok



How to configure out if scdpmd consume to much memory?

t2000d# ps -ef | grep scdpmd
root 5355 1 0 Aug 20 ? 390:26 /usr/cluster/lib/sc/scdpmd
t2000d# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
5355 root 952M 8520K sleep 59 0 6:29:59 4.4% scdpmd/14



How to disable reboot_on_path_failure flag?

t2000d# clnode set -p reboot_on_path_failure=disabled t2000d t2000e
t2000d# scdpm -p all:all
t2000e:reboot_on_path_failure disabled
t2000e:/dev/did/rdsk/d1 Ok
t2000e:/dev/did/rdsk/d2 Ok
t2000e:/dev/did/rdsk/d4 Ok
t2000e:/dev/did/rdsk/d5 Ok
t2000e:/dev/did/rdsk/d6 Ok
t2000e:/dev/did/rdsk/d7 Ok
t2000d:reboot_on_path_failure disabled
t2000d:/dev/did/rdsk/d10 Ok
t2000d:/dev/did/rdsk/d11 Ok
t2000d:/dev/did/rdsk/d13 Ok
t2000d:/dev/did/rdsk/d14 Ok
t2000d:/dev/did/rdsk/d6 Ok
t2000d:/dev/did/rdsk/d7 Ok



How to restart scdpm service to prevent memory leak?

t2000d# svcadm restart svc:/system/cluster/scdpm:default


Additional information and best practices informations are available in
Infodoc 1004119.1: Sun[TM] Cluster 3.2 Disk Path Monitoring and how to test for losing path to storage

Comments:

Anyway, the reboot_on_path_failure is useless when ZFS is used. It's for outdated SVM and VxVM but who needs them now?

http://napobo3.blogspot.com/2007/10/why-rebootonpathfailure-is-useless-when.html

Posted by Leon Koll on October 09, 2007 at 05:01 AM CEST #

Yes, but a lot of installations are still using SVM or VxVM.
SunAlert 103120 is available for this issue:
http://sunsolve.sun.com/search/document.do?assetkey=1-26-103120-1
More information about the ZFS issue in Document 87255: Solaris[TM] ZFS & Write Failure
http://sunsolve.sun.com/search/document.do?assetkey=1-9-87255-1

Posted by Juergen on November 07, 2007 at 06:35 AM CET #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today