Cold failover for a single instance RAC database

This blog posting is about protecting an instance from a 10.2/11.1 single instance RAC database so that it can act in a cold-failover situation. I want to refer to the great document "Using Oracle Clusterware to Protect a single instance oracle database 11g" written by my collegue Philip Newlan since most input comes from here.

The mentioned pdf describes how to make sure a single instance database can failover with the use of Oracle Clusterware.

With this posting I want to show how to do this for a single instance RAC database where you have to manage instance1' and 'instance2' instead of just one instance.

Since the grid control agent may have some problems with an instance[number] travelling from node1 to node2 this choice was made on purpose for some customer. Other reason for this awkward solution is that the application for some reason cannot run with two instances concurrently.

Also, the fact that instance deployment with sequential instance numbers is standard in their grid environment the choice has been made to do this with different instance numbers instead of one.

Within this posting I will also show the required updates in tnsnames and spfile.

First, I made sure the OCR entries of the RAC database, the Services and the instances are removed from the OCR. Then a "resource group" will be created. This is the container for all the resources.


oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_profile -create \
oss.xdbprk.rg -t application -a \
/ora/product/11.1.0/crs/crs/public/act_resgroup.pl -o "ci=600"
oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_register oss.xdbprk.rg

Now, let's verify the new entry:


oracle@pts0138([crs]):/ora/product/11.1.0/crs/crs/public> crsstat | grep rg
HA Resource Target State
oss.xdbprk.rg OFFLINE OFFLINE

The scripts mentioned in the pdf are placed in $CRS_HOME/crs/public on both nodes. I made sure the scripts are executable and tested them:


export CLUSTERWARE_HOME=/ora/product/11.1.0/crs/
export ORACLE_HOME=/ora/product/10.2.0/db_2
export _USR_ORA_LANG=$ORACLE_HOME
export _USR_ORA_SRV=xdbprk2
export _USR_ORA_FLAGS=1
oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl start
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:21 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
SQL> Connected to an idle instance.
SQL> ORACLE instance started.
Total System Global Area 536870912 bytes
Fixed Size 2085360 bytes
Variable Size 150998544 bytes
Database Buffers 377487360 bytes
Redo Buffers 6299648 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options

Also tested the stop function.


oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl stop
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:37 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
SQL> Connected.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
oracle@pts0138(xdbprk2):/ora/product/11.1.0/crs/crs/public>

The action above was executed on both nodes. Just to verify if the instance could be started/stoped with the scripts. Only problem was: for each node, the ORACLE_SID had to be changed, as both nodes have another ORACLE_SID for that database. With only one SID you would not have the problem.

Then, the failover resource was created and registered.


oracle@pts0138([crs]):/tmp> crs_profile -create oss.xdbprk.db-cold-failover \
-t application -r oss.xdbprk.rg -a
/ora/product/11.1.0/crs/crs/public/act_db.pl
-o "ci=20,ra=5,osrv=xdbprk,ol=/ora/product/10.2.0/db_2,oflags=1,rt=600"
oracle@pts0138([crs]):/tmp> crs_register oss.xdbprk.db-cold-failover

The value osrv=xdbprk will never work as this is not the correct instance name for any of the nodes. Even if I made the value osrv=xdbprk1, then the script would only work on one of the nodes i.e. the node that had the appropriate init.ora etc.

So, leaving the value osrv=xdbprk to this, I actually hard-coded the ORACLE_SID on both sides of the cluster in the act_db.pl. Since the value is now hard coded, it should work. This clearly limits the option to re-use the script for other database, so I'd better change the name of the script to act_db_xdbprk.pl if I do this for real.

So, how will the resource profiles look now ?


oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.rg
NAME=oss.xdbprk.rg
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_resgroup.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=600
DESCRIPTION=oss.xdbprk.rg
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=


oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.db-cold-failover
NAME=oss.xdbprk.db-cold-failover
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_db.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=20
DESCRIPTION=oss.xdbprk.db-cold-failover
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=oss.xdbprk.rg
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=1
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=/ora/product/10.2.0/db_2
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=xdbprk
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

Okay, and then the basic test of starting the resource, first all is down:


oracle@pts0101([crs]):/var/opt/oracle> crsstat
HA Resource Target State
----------- ------ -----
oss.xdbprk.db-cold-failover OFFLINE OFFLINE
oss.xdbprk.rg OFFLINE OFFLINE

Then, the start:


oracle@pts0101([crs]):/var/opt/oracle> crs_start oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0138([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle 31568 1 0 11:51 ? 00:00:00 ora_smon_xdbprk2

And the relocate:


oracle@pts0101([crs]):/var/opt/oracle> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0138`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0138`
Stop of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0101`
Start of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

Let's verify if it runs on the other node:


oracle@pts0101([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle 11568 1 0 11:55 ? 00:00:00 ora_smon_xdbprk1

And stopped on the original:


oracle@pts0138(xdbprk2):/ora/product/10.2.0/db_2/dbs> ps -ef | grep smon | grep dbprk

Okay, what is left to do from here:

Service names that used to be managed by CRS, now have to be coded hard in the spfile so that they register each time with the listener:

service_names in spfile added:


dbprk.xe.grid
dbprk.xe.supp
dbprk.xe.link

Also, for each node, another local listener will be used. I made sure this is in the spfile:


xdbprk1.local_listener='pts0101-LOCAL-LISTENER'
xdbprk2.local_listener='pts0138-LOCAL-LISTENER'

In order to be sure there can only be started one instance the cluster_database_instances_parameter is set to 1.


*.cluster_database_instances=1

For tnsnames a normail failover entry is created, if the first instance is down, the next will be found (and should be running):


oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> tnsping dbprk.xe.grid
TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 18-SEP-2009 12:39:14
Copyright (c) 1997, 2007, Oracle. All rights reserved.
Used parameter files:
/etc/oss/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=pts0101-grid.nl.eu.abnamro.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=pts0138-grid.nl.eu.abnamro.com)(PORT=1521))(LOAD_BALANCE=ON))(CONNECT_DATA=(SERVICE_NAME=dbprk.xe.grid)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=20))))
OK (10 msec)

So, another basic test, what is the situation:


HA Resource Target State
----------- ------ -----
oss.xdbprk.db-cold-failover ONLINE ONLINE on pts0101
oss.xdbprk.rg ONLINE ONLINE on pts0101
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_start oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

In which instance will my session end ?


oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:39:55 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk1

And after the relocate, the session should go to the other instance:


oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0101`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0101`
Stop of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0138`
Start of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crsstat
HA Resource Target State
----------- ------ -----
oss.xdbprk.db-cold-failover ONLINE ONLINE on pts0138
oss.xdbprk.rg ONLINE ONLINE on pts0138


oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:41:22 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk2

As an extra test, to make sure the two instances cannot be started concurrently, started with a test to start the second instance after starting the first.
As you can see this is not possible.


oracle@pts0101(*):/var/opt/oracle> db xdbprk1
ORACLE_SID=xdbprk1
ORACLE_HOME=/ora/product/10.2.0/db_2
oracle@pts0101(xdbprk1):/var/opt/oracle> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:42:02 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup;
ORACLE instance started.
Total System Global Area 536870912 bytes
Fixed Size 2085360 bytes
Variable Size 331353616 bytes
Database Buffers 197132288 bytes
Redo Buffers 6299648 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

Rene Kundersma
Oracle Technology Services, The Netherlands

Comments:

You say "you need to take out OCR entries for instances/services"? Why is that - so that you can mimic the RAC instance as a single DB with cold failover using the rg1 resources as mentioned in the pdf? If I did not remove the entries and still have single instance RAC - what steps should I take to move the Instance from one node to another? My DB version is 11.1.0.6. Thanks.

Posted by Pattar on September 23, 2009 at 11:51 PM PDT #

If you keep your existing OCR entries the database, instance and services will be managed the conventional way. What you need to do here is make sure only the script is used. Therefor you need to create a new entry. Resource groups are not really required b.t.w. If you only want to move the RAC instance from one node to the other, you need to make sure you have init.ora, password-file, diag_dest etc. in place. By the way, I was thinking "Rac one node" maybe what you are looking for: http://www.oracle.com/technology/products/database/clustering/pdf/ds-oracleraconenode-2009.pdf Rene

Posted by Rene on September 24, 2009 at 05:33 AM PDT #

I was looking into Omotion but that looks like in 11gR2 only. Will have to do further reading on it... But, your way will work with DB stopped/started (cold down) which might also work in my case...

Posted by Pattar on October 06, 2009 at 04:53 AM PDT #

Oh! nevermind - I need to look into resource manager instance caging in my case. Thanks.

Posted by Pattar on October 06, 2009 at 04:57 AM PDT #

Any white papers describing how to fail over an Oracle 11g Release 2 (non-RAC) Database

Posted by guest on July 14, 2011 at 08:16 PM PDT #

For this we recommend RAC One Node.

Posted by Rene on July 18, 2011 at 04:21 AM PDT #

I need assistance or any documents for Oracle 11.2.0.2 ColdFailover setup.

Posted by guest on July 19, 2011 at 12:13 AM PDT #

RAC One Node is licensed.
Cold failover maybe not.
So, any white papers describing how to fail over an Oracle 11g Release 2 (non-RAC) Database?

Posted by guest on November 07, 2011 at 11:47 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Blog of Rene Kundersma, Principal Member of Technical Staff at Oracle Development USA. I am designing and evaluating solutions and best practices around database MAA focused on Exadata. This involves HA, backup/recovery, migration and database consolidation and upgrades on Exadata. Opinions are my own and not necessarily those of Oracle Corporation. See http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today