Preface
Nowadays, we are doing a 10g upgrade for one of our clients and hit upon the idea of pre-staging the 10gR2 CRS + DB technology stack on their RAC servers that are running 9iR2 RAC on HP service gaurd already. This is nothing but a downtime reduction technique, that saved about 5-6 hours. Thankfully, the idea worked, but not before some excitement.
Suprise, surprise..
A week after doing the 10gR2 CRS + DB installation on the pre-production servers, when we were starting the real Database upgrade, we had to bring up the 10gR2 CRS.
I was surprised to see that the 10gR2 CRS services would not come up. We had tried the following three things:
1) Uncommenting the crs, css, and evm daemons in /etc/inittab
2) Issued /etc/init.d/init.crs enable
3) Issued /etc/init.d/init.crs start
4) Isused $ORA_CRS_HOME/bin/crsctl start crs
I was pretty aghast. Apart from thinking of logging a tar, on searching metalink, we came across a new command that I had not tried in 10gR1.
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl status nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl start nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl status nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
Redemption
This is then when we tried the crsctl start resources and the CRS actually came up:
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl start resources
Starting resources.
Successfully started CRS resources
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
Conclusion
At this point, I am not sure why the behaviour changed in 10gR2, whether it was intentional or un-intentional or whether this is a bug. But I am glad that we have a workaround. Everyday is a new learning.
Comments (3)
Hi,gaurav.verma:
Yesterday I happened the issue that was same to yours.I have two nodes in my rac environment.one of them can not startup the
CRS.following your operate,I finally execute the command--crsctl start resources .but the crs was still down.
Why is the reasion that one node's crs work well but another is bad?
I probably confirm my OCR and voting disk work well.
Can you give me some advice?
Posted by Kevin.yuan | July 17, 2008 9:36 AM
Posted on July 17, 2008 09:36
Recently i faced problem starting oracle application on my galaxy cluster on one node.In the log i found that the CRS demon was not started after the booting of the node , so i manually tried to start it but faced some error.
So here are the work around that i had done and the CRS services got started .
The error i was getting while starting oracle is
======================
PRKC-1056 : Failed to get the hostname for node galclus157
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
======================
When i tried to start the crsd manually the service did not started .
Then after debugging this error i found that the crs service depends on the ucmmd service to start .
So please check if this is already running or not (If not start it)
===================================
root@galclus157# ps -aef | grep ucmmd
root 2030 1 0 May 12 ? 13:12 ucmmd -r /usr/cluster/lib/ucmm/ucmm_reconf
===================================
Posted by Amit Ranjan Sahu | May 19, 2009 1:27 AM
Posted on May 19, 2009 01:27
Recently i faced problem starting oracle application on my galaxy cluster on one node.In the log i found that the CRS demon was not started after the booting of the node , so i manually tried to start it but faced some error.
So here are the work around that i had done and the CRS services got started .
The error i was getting while starting oracle is
======================
PRKC-1056 : Failed to get the hostname for node galclus157
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
======================
root@galclus157#rsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
=========================
When i tried to start the crsd manually the service did not started .
Then after debugging this error i found that the crs service depends on the ucmmd service to start .
So please check if this is already running or not (If not start it)
===================================
root@galclus157# ps -aef | grep ucmmd
root 2030 1 0 May 12 ? 13:12 ucmmd -r /usr/cluster/lib/ucmm/ucmm_reconf
===================================
Posted by Amit Ranjan Sahu | May 19, 2009 1:29 AM
Posted on May 19, 2009 01:29