Preface
Nowadays, we are doing a 10g upgrade for one of our clients and hit upon the idea of pre-staging the 10gR2 CRS + DB technology stack on their RAC servers that are running 9iR2 RAC on HP service gaurd already. This is nothing but a downtime reduction technique, that saved about 5-6 hours. Thankfully, the idea worked, but not before some excitement.
Suprise, surprise..
A week after doing the 10gR2 CRS + DB installation on the pre-production servers, when we were starting the real Database upgrade, we had to bring up the 10gR2 CRS.
I was surprised to see that the 10gR2 CRS services would not come up. We had tried the following three things:
1) Uncommenting the crs, css, and evm daemons in /etc/inittab
2) Issued /etc/init.d/init.crs enable
3) Issued /etc/init.d/init.crs start
4) Isused $ORA_CRS_HOME/bin/crsctl start crs
I was pretty aghast. Apart from thinking of logging a tar, on searching metalink, we came across a new command that I had not tried in 10gR1.
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl status nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl start nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./srvctl status nodeapps -n raclinux1
PRKH-1010 : Unable to communicate with CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
Redemption
This is then when we tried the crsctl start resources and the CRS actually came up:
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl start resources
Starting resources.
Successfully started CRS resources
raclinux1:/opt/oracle/product/10.2.0/CRS/bin # ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
Conclusion
At this point, I am not sure why the behaviour changed in 10gR2, whether it was intentional or un-intentional or whether this is a bug. But I am glad that we have a workaround. Everyday is a new learning.