X

An Oracle blog about Exadata

  • September 18, 2009

Cold failover for a single instance RAC database

Rene Kundersma
Software Engineer

This blog posting is about protecting an instance from a 10.2/11.1 single instance RAC database so that it can act in a cold-failover situation. I want to refer to the great document "Using Oracle Clusterware to Protect a single instance oracle database 11g" written by my collegue Philip Newlan since most input comes from here.

The mentioned pdf describes how to make sure a single instance database can failover with the use of Oracle Clusterware.

With this posting I want to show how to do this for a single instance RAC database where you have to manage instance1' and 'instance2' instead of just one instance.

Since the grid control agent may have some problems with an instance[number] travelling from node1 to node2 this choice was made on purpose for some customer. Other reason for this awkward solution is that the application for some reason cannot run with two instances concurrently.

Also, the fact that instance deployment with sequential instance numbers is standard in their grid environment the choice has been made to do this with different instance numbers instead of one.

Within this posting I will also show the required updates in tnsnames and spfile.

First, I made sure the OCR entries of the RAC database, the Services and the instances are removed from the OCR. Then a "resource group" will be created. This is the container for all the resources.


oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_profile -create \

oss.xdbprk.rg -t application -a \

/ora/product/11.1.0/crs/crs/public/act_resgroup.pl -o "ci=600"

oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_register oss.xdbprk.rg

Now, let's verify the new entry:


oracle@pts0138([crs]):/ora/product/11.1.0/crs/crs/public> crsstat | grep rg

HA Resource Target State

oss.xdbprk.rg OFFLINE OFFLINE

The scripts mentioned in the pdf are placed in $CRS_HOME/crs/public on both nodes. I made sure the scripts are executable and tested them:


export CLUSTERWARE_HOME=/ora/product/11.1.0/crs/

export ORACLE_HOME=/ora/product/10.2.0/db_2

export _USR_ORA_LANG=$ORACLE_HOME

export _USR_ORA_SRV=xdbprk2

export _USR_ORA_FLAGS=1

oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl start

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:21 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

SQL> Connected to an idle instance.

SQL> ORACLE instance started.

Total System Global Area 536870912 bytes

Fixed Size 2085360 bytes

Variable Size 150998544 bytes

Database Buffers 377487360 bytes

Redo Buffers 6299648 bytes

Database mounted.

Database opened.

SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options

Also tested the stop function.


oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl stop

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:37 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

SQL> Connected.

SQL> Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options

oracle@pts0138(xdbprk2):/ora/product/11.1.0/crs/crs/public>

The action above was executed on both nodes. Just to verify if the instance could be started/stoped with the scripts. Only problem was: for each node, the ORACLE_SID had to be changed, as both nodes have another ORACLE_SID for that database. With only one SID you would not have the problem.

Then, the failover resource was created and registered.


oracle@pts0138([crs]):/tmp> crs_profile -create oss.xdbprk.db-cold-failover \

-t application -r oss.xdbprk.rg -a

/ora/product/11.1.0/crs/crs/public/act_db.pl

-o "ci=20,ra=5,osrv=xdbprk,ol=/ora/product/10.2.0/db_2,oflags=1,rt=600"

oracle@pts0138([crs]):/tmp> crs_register oss.xdbprk.db-cold-failover

The value osrv=xdbprk will never work as this is not the correct instance name for any of the nodes. Even if I made the value osrv=xdbprk1, then the script would only work on one of the nodes i.e. the node that had the appropriate init.ora etc.

So, leaving the value osrv=xdbprk to this, I actually hard-coded the ORACLE_SID on both sides of the cluster in the act_db.pl. Since the value is now hard coded, it should work. This clearly limits the option to re-use the script for other database, so I'd better change the name of the script to act_db_xdbprk.pl if I do this for real.

So, how will the resource profiles look now ?


oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.rg

NAME=oss.xdbprk.rg

TYPE=application

ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_resgroup.pl

ACTIVE_PLACEMENT=0

AUTO_START=restore

CHECK_INTERVAL=600

DESCRIPTION=oss.xdbprk.rg

FAILOVER_DELAY=0

FAILURE_INTERVAL=0

FAILURE_THRESHOLD=0

HOSTING_MEMBERS=

OPTIONAL_RESOURCES=

PLACEMENT=balanced

REQUIRED_RESOURCES=

RESTART_ATTEMPTS=1

SCRIPT_TIMEOUT=60

START_TIMEOUT=0

STOP_TIMEOUT=0

UPTIME_THRESHOLD=7d

USR_ORA_ALERT_NAME=

USR_ORA_CHECK_TIMEOUT=0

USR_ORA_CONNECT_STR=/ as sysdba

USR_ORA_DEBUG=0

USR_ORA_DISCONNECT=false

USR_ORA_FLAGS=

USR_ORA_IF=

USR_ORA_INST_NOT_SHUTDOWN=

USR_ORA_LANG=

USR_ORA_NETMASK=

USR_ORA_OPEN_MODE=

USR_ORA_OPI=false

USR_ORA_PFILE=

USR_ORA_PRECONNECT=none

USR_ORA_SRV=

USR_ORA_START_TIMEOUT=0

USR_ORA_STOP_MODE=immediate

USR_ORA_STOP_TIMEOUT=0

USR_ORA_VIP=

oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.db-cold-failover

NAME=oss.xdbprk.db-cold-failover

TYPE=application

ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_db.pl

ACTIVE_PLACEMENT=0

AUTO_START=restore

CHECK_INTERVAL=20

DESCRIPTION=oss.xdbprk.db-cold-failover

FAILOVER_DELAY=0

FAILURE_INTERVAL=0

FAILURE_THRESHOLD=0

HOSTING_MEMBERS=

OPTIONAL_RESOURCES=

PLACEMENT=balanced

REQUIRED_RESOURCES=oss.xdbprk.rg

RESTART_ATTEMPTS=5

SCRIPT_TIMEOUT=60

START_TIMEOUT=600

STOP_TIMEOUT=0

UPTIME_THRESHOLD=7d

USR_ORA_ALERT_NAME=

USR_ORA_CHECK_TIMEOUT=0

USR_ORA_CONNECT_STR=/ as sysdba

USR_ORA_DEBUG=0

USR_ORA_DISCONNECT=false

USR_ORA_FLAGS=1

USR_ORA_IF=

USR_ORA_INST_NOT_SHUTDOWN=

USR_ORA_LANG=/ora/product/10.2.0/db_2

USR_ORA_NETMASK=

USR_ORA_OPEN_MODE=

USR_ORA_OPI=false

USR_ORA_PFILE=

USR_ORA_PRECONNECT=none

USR_ORA_SRV=xdbprk

USR_ORA_START_TIMEOUT=0

USR_ORA_STOP_MODE=immediate

USR_ORA_STOP_TIMEOUT=0

USR_ORA_VIP=

Okay, and then the basic test of starting the resource, first all is down:


oracle@pts0101([crs]):/var/opt/oracle> crsstat

HA Resource Target State

----------- ------ -----

oss.xdbprk.db-cold-failover OFFLINE OFFLINE

oss.xdbprk.rg OFFLINE OFFLINE

Then, the start:


oracle@pts0101([crs]):/var/opt/oracle> crs_start oss.xdbprk.db-cold-failover

Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`

Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.

oracle@pts0138([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk

oracle 31568 1 0 11:51 ? 00:00:00 ora_smon_xdbprk2

And the relocate:


oracle@pts0101([crs]):/var/opt/oracle> crs_relocate -f oss.xdbprk.db-cold-failover

Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0138`

Stop of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.

Attempting to stop `oss.xdbprk.rg` on member `pts0138`

Stop of `oss.xdbprk.rg` on member `pts0138` succeeded.

Attempting to start `oss.xdbprk.rg` on member `pts0101`

Start of `oss.xdbprk.rg` on member `pts0101` succeeded.

Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`

Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

Let's verify if it runs on the other node:


oracle@pts0101([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk

oracle 11568 1 0 11:55 ? 00:00:00 ora_smon_xdbprk1

And stopped on the original:


oracle@pts0138(xdbprk2):/ora/product/10.2.0/db_2/dbs> ps -ef | grep smon | grep dbprk

Okay, what is left to do from here:

Service names that used to be managed by CRS, now have to be coded hard in the spfile so that they register each time with the listener:

service_names in spfile added:


dbprk.xe.grid

dbprk.xe.supp

dbprk.xe.link

Also, for each node, another local listener will be used. I made sure this is in the spfile:


xdbprk1.local_listener='pts0101-LOCAL-LISTENER'

xdbprk2.local_listener='pts0138-LOCAL-LISTENER'

In order to be sure there can only be started one instance the cluster_database_instances_parameter is set to 1.


*.cluster_database_instances=1

For tnsnames a normail failover entry is created, if the first instance is down, the next will be found (and should be running):


oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> tnsping dbprk.xe.grid

TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 18-SEP-2009 12:39:14

Copyright (c) 1997, 2007, Oracle. All rights reserved.

Used parameter files:

/etc/oss/sqlnet.ora

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=pts0101-grid.nl.eu.abnamro.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=pts0138-grid.nl.eu.abnamro.com)(PORT=1521))(LOAD_BALANCE=ON))(CONNECT_DATA=(SERVICE_NAME=dbprk.xe.grid)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=20))))

OK (10 msec)

So, another basic test, what is the situation:


HA Resource Target State

----------- ------ -----

oss.xdbprk.db-cold-failover ONLINE ONLINE on pts0101

oss.xdbprk.rg ONLINE ONLINE on pts0101

oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_start oss.xdbprk.db-cold-failover

Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`

Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

In which instance will my session end ?


oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:39:55 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options

SQL> select instance_name from v$instance;

INSTANCE_NAME

----------------

xdbprk1

And after the relocate, the session should go to the other instance:


oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_relocate -f oss.xdbprk.db-cold-failover

Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0101`

Stop of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

Attempting to stop `oss.xdbprk.rg` on member `pts0101`

Stop of `oss.xdbprk.rg` on member `pts0101` succeeded.

Attempting to start `oss.xdbprk.rg` on member `pts0138`

Start of `oss.xdbprk.rg` on member `pts0138` succeeded.

Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`

Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.

oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crsstat

HA Resource Target State

----------- ------ -----

oss.xdbprk.db-cold-failover ONLINE ONLINE on pts0138

oss.xdbprk.rg ONLINE ONLINE on pts0138

oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:41:22 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options

SQL> select instance_name from v$instance;

INSTANCE_NAME

----------------

xdbprk2

As an extra test, to make sure the two instances cannot be started concurrently, started with a test to start the second instance after starting the first.

As you can see this is not possible.


oracle@pts0101(*):/var/opt/oracle> db xdbprk1

ORACLE_SID=xdbprk1

ORACLE_HOME=/ora/product/10.2.0/db_2

oracle@pts0101(xdbprk1):/var/opt/oracle> sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:42:02 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> startup;

ORACLE instance started.

Total System Global Area 536870912 bytes

Fixed Size 2085360 bytes

Variable Size 331353616 bytes

Database Buffers 197132288 bytes

Redo Buffers 6299648 bytes

Database mounted.

ORA-01092: ORACLE instance terminated. Disconnection forced

Rene Kundersma

Oracle Technology Services, The Netherlands

Join the discussion

Comments ( 6 )
  • Pattar Tuesday, October 6, 2009
    I was looking into Omotion but that looks like in 11gR2 only. Will have to do further reading on it... But, your way will work with DB stopped/started (cold down) which might also work in my case...
  • Pattar Tuesday, October 6, 2009
    Oh! nevermind - I need to look into resource manager instance caging in my case.
    Thanks.
  • guest Friday, July 15, 2011

    Any white papers describing how to fail over an Oracle 11g Release 2 (non-RAC) Database


  • Rene Monday, July 18, 2011

    For this we recommend RAC One Node.


  • guest Tuesday, July 19, 2011

    I need assistance or any documents for Oracle 11.2.0.2 ColdFailover setup.


  • guest Tuesday, November 8, 2011

    RAC One Node is licensed.

    Cold failover maybe not.

    So, any white papers describing how to fail over an Oracle 11g Release 2 (non-RAC) Database?


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.