Clusterware 11gR2 – Setting up an Active/Passive failover configuration

Oracle is providing a large range of interesting solutions to ensure High Availability of the database.
Dataguard, RAC or even both configurations (as recommended by Oracle for a Maximum Available Architecture - MAA) are the most frequently found and used solutions.
But, when it comes to protecting a system with an Active/Passive architecture with failover capabilities, people often thinks to other expensive third party cluster systems.

Oracle Clusterware technology, which comes along at no extra-cost with Oracle Database or Oracle Unbreakable Linux, is - in the knowing of most people - often linked to Oracle RAC and therefore, is seldom used to implement failover solutions.
Oracle Clusterware 11gR2 (a part of Oracle 11gR2 Grid Infrastructure) provides a comprehensive framework to setup automatic failover configurations.
It is actually possible to make "failover-able'", and then to protect, almost any kind of application (from the simple xclock to the most complex Application Server).

Quoting Oracle:
“Oracle Clusterware is a portable cluster software that allows clustering of single servers so that they cooperate as a single system. Oracle Clusterware also provides the required infrastructure for Oracle Real Application Clusters (RAC). In addition Oracle Clusterware enables the protection of any Oracle application or any other kind of application within a cluster.”

In the next couple of lines, I will try to present the different steps to achieve this goal : Have a fully operational 11gR2 database protected by automatic failover capabilities.

I assume you are fluent in installing Oracle Database 11gR2, Oracle Grid Infrastructure 11gR2 on a Linux system and that ASM is not a problem for you (as I am using it as a shared storage).
If not, please have a look at Oracle Documentation.

As often, I made my tests using an Oracle VirtualBox environment. The scripts are tested and functional on my system. Unfortunately, there can always be a typo or a mistake.

This blog entry does not replace a course around the Clusterware Framework.
I just hope it will let you see how powerful it is and that it will give you the whilst to go further with it...

Note-1 : Feb-2011. This entry has been revised (rev.2) following comments from Philip Newlan.

Note-2 : 01-Aug-2011. This entry has been revised (rev.3) to answer the most common questions and to clarify the position of this solution regarding Oracle RacOne..

 

Prerequisite

  • 2 Linux boxes (OELCluster01 and OELCluster02) at the same OS level. I used OEL 5 Update 5 with an Enterprise Kernel.
  • Shared Storage (SAN). On my VirtualBox system, I used Openfiler to simulate the SAN
  • Oracle 11gR2 Database (11.2.0.1)
  • Oracle 11gR2 Grid Infrastructure (11.2.0.1)

Step 1 - Install the software

  • Using asmlib, create 3 ASM disks (ASM_CRS, ASM_DTA and ASM_FRA)
  • Install Grid Infrastructure for a cluster (OELCluster01 and OELCluster02 are the 2 nodes of the cluster)
    • Use ASM_CRS to store Voting Disk and OCR.
    • Use SCAN.
  • Install Oracle Database Standalone binaries on both nodes.
    • Use asmca to check/mount the disk groups on 2 nodes
    • Use dbca to create and configure a database on the primary node
      • Let's name it DB11G.
      • Copy the pfile, password file to the second node.
      • Create adump directoty on the second node.

Step 2 - Setup the resource to be protected

After its creation with dbca, the database is automatically protected by the Oracle Restart technology available with Grid Infrastructure. Consequently, it restarts automatically (if possible) after a crash (ex: kill -9 smon).
A database resource has been created for that in the Cluster Registry.
We can observe this with the command : crsctl status resource that shows and ora.dba11g.db entry.

Let's save the definition of this resource, for future use :

  • mkdir -p /crs/11.2.0/HA_scripts
  • chown oracle:oinstall /crs/11.2.0/HA_scripts
  • crsctl status resource ora.db11g.db -p > /crs/11.2.0/HA_scripts/myResource.txt

Although very interesting, Oracle Restart is not cluster aware and cannot restart the database on any other node of the cluster. So, let's remove it from the OCR definitions, we don't need it !

  • srvctl stop database -d DB11G
  • srvctl remove database -d DB11G

Instead of it, we need to create a new resource of a more general type : cluster_resource. Here are the steps to achieve this :

  • Create an action script : /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh

#!/bin/bash

export ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=DB11G

case $1 in
'start')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
startup
EOF
RET=0
;;
'stop')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown immediate
EOF
RET=0
;;

'clean')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown abort
##for i in `ps -ef | grep -i $ORACLE_SID | awk '{print $2}' ` ;do kill -9 $i; done

EOF
RET=0
;;

'check')
ok=`ps -ef | grep smon | grep $ORACLE_SID | wc -l`
if [ $ok = 0 ]; then
RET=1
else
RET=0
fi
;;
'*')
RET=0
;;
esac

if [ $RET -eq 0 ]; then
exit 0
else
exit 1
fi

This script must provide, at least, methods to start, stop, clean and check the database. It is self-explaining and contains nothing special. Just be aware that it must be runnable (+x), it runs as Oracle user (because of the ACL property - see later) and needs to know about the environment. Also make sure it exists on every node of the cluster. Moreover, as of 11.2, the clean method is mandatory. It must provide the “last gasp clean up”, for example, a shutdown abort or a kill –9 of all the remaining processes.

  • chmod +x /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
  • scp /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh oracle@OELCluster02:/crs/11.2.0/HA_scripts

Create a new resource file, based on the information we got from previous myResource.txt .
Name it myNewResource.txt.
myResource.txt
is shown below.
As we can see, it defines an ora.database.type resource, named ora.db11g.db.
A lot of properties are related to this type of resource and do not need to be used for a cluster_resource.

NAME=ora.db11g.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
CHECK_TIMEOUT=600
CLUSTER_DATABASE=false
DB_UNIQUE_NAME=DB11G
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle Database resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
GEN_AUDIT_FILE_DEST=/oracle/admin/DB11G/adump
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(oelcluster01)=DB11G
HOSTING_MEMBERS=
INSTANCE_FAILOVER=0
LOAD=1
LOGGING_LEVEL=1
MANAGEMENT_POLICY=AUTOMATIC
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
PLACEMENT=restricted
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=2
ROLE=PRIMARY
SCRIPT_TIMEOUT=60
SERVER_POOLS=ora.DB11G
SPFILE=+DTA/DB11G/spfileDB11G.ora
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h
USR_ORA_DB_NAME=DB11G
USR_ORA_DOMAIN=haroland
USR_ORA_ENV=
USR_ORA_FLAGS=
USR_ORA_INST_NAME=DB11G
USR_ORA_OPEN_MODE=open
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.1.0


I removed database type related entries from myResource.txt and modified some other to produce the following myNewResource.txt.

  • Notice the NAME property that should not have the ora. prefix
  • Notice the TYPE property that is not ora.database.type but cluster_resource.
  • Notice the definition of ACTION_SCRIPT.
  • Notice the HOSTING_MEMBERS that enumerates the members of the cluster (as returned by the olsnodes command).

NAME=DB11G.db
TYPE=cluster_resource
DESCRIPTION=Oracle Database resource
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_SCRIPT=/crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
PLACEMENT=restricted
ACTIVE_PLACEMENT=0
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=10
DEGREE=1
ENABLED=1
HOSTING_MEMBERS=oelcluster01 oelcluster02
LOGGING_LEVEL=1
RESTART_ATTEMPTS=1
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h

Register the resource.
Take care of the resource type. It needs to be a cluster_resource and not a ora.database.type resource (Oracle recommendation) .

  • crsctl add resource DB11G.db -type cluster_resource -file /crs/11.2.0/HA_scripts/myNewResource.txt

Step 3 - Start the resource

  • crsctl start resource DB11G.db

This command launches the ACTION_SCRIPT with a start and a check parameter on the primary node of the cluster.

Step 4 - Test this

We will test the setup using 2 methods.

  • crsctl relocate resource DB11G.db

This command calls the ACTION_SCRIPT (on the two nodes) to stop the database on the active node and start it on the other node.
Once done, we can revert back to the original node, but, this time we can use a more "MS$ like" method :Turn off the server on which the database is running.
After short delay, you should observe that the database is relocated on node 1.

 

Conclusion

Once the software installed and the standalone database created (which is a rather common and usual task), the steps to reach the objective are quite easy :

  1. Create an executable action script on every node of the cluster.
  2. Create a resource file.
  3. Create/Register the resource with OCR
    using the resource file.
  4. Start the resource.

This solution is a very interesting alternative to licensable third party solutions.

 

 

A few words about this solution compared to the Oracle RacOne alternative

As you might have understood, the solution I presented here removes the “db specificities” of the ohad definition to make the database  a “classical application”.
That means that the "application database" no longer appears as an Oracle resource  (the resource type moved from TYPE=ora.database.type to TYPE=cluster_resource ,also the number of properties that qualify the resource reduced a lot)  to Oracle Clusterware and treatment of the application is individualized (non-standardized) and subject to what you implement as far as resource attributes and action scripts are concerned.

RacOne, although it is a licensable solution, goes far beyond this simple solution. With RacOne, the resource still remains a database resource (and not a simple application resource). It is a fully tested and supported solution. It is compatible with dataguard. It can easily be upgraded to RAC.

Let me be clearer : That does not mean the solution presented here is not supported. But as you are responsible of the content of you HA script, you cannot ask Oracle Support to help you in case of issues with it.
See MOS note #790189.1 .

 

Regarding Licences :

At the time of this entry,

  • Oracle Grid Infrastructure do not need licence. It comes with the Database licence.
  • Oracle RAC comes with no extra-licence with Oracle 11g Standard Edition.
  • Oracle RAC needs extra-licence with Oracle 11g Enterprise Edition.
  • Oracle RacOne needs a specific licence.

 

Regarding Grid Infrastructure:

The solution I described needs a complete Grid Infrastructure installation, including the Interconnect and Scan.

  • Interconnect will only be used for Heartbeat purposes. Therefore, it does not need a large bandwidth as requested by RAC.
  • Scan will simplify client connection to a failed over node. Also, 3 IPs is not required.

 

 

References

Gilles Haro
Technical Expert - Core Technology, Oracle Consulting

IronManSignature_thumb

Comments:

This is quite interesting - thanks for posting. However, it sounds similar to the RAC One Node functionality. Is it that using this method requires no additional RAC licensing costs? Thanks, Leighton

Posted by Leighton on January 10, 2011 at 07:14 PM CET #

Hello Leighton. Thank you for your comment. Indeed, RAC One Node can achieve the same functionality. And many more, like providing service continuity during upgrades or other operations on the server. RAC One Node is also a recommended solution by Oracle. The only thing is that it requires an extra RAC Licence. Not everyone can afford this, just to have Active/Passive failover capabilities. On the contrary, the solution provided here needs no extra licence as Oracle Clusterware comes included in the Database 11gR2 Licence or Unbreakable Linux. Moreover, this solution can be adapted to other applications. Rac One Node only applies to the Database. Hope this helps, Gilles

Posted by Gilles Haro on January 11, 2011 at 01:56 AM CET #

Hi, Have you tried this with an application-vip, and let the appvip be part of the same 'package' as the db + listener? I've tried it and could never get the dependencies to work properly. regds /P

Posted by Peter on January 12, 2011 at 07:57 AM CET #

Hi Peter, No I did not try application-vip. Many reasons for this : - For the database, this stuff is now handled by the SCAN. - application-vip is usually used by resources like Application Servers. I am looking for another simpler kind of software to setup a test. But, anyway, if you want to set up something related to the database, it is far easier and safer to work with the SCAN. Regards, Gilles

Posted by Gilles Haro on January 13, 2011 at 01:31 AM CET #

Hello, Take a look at ASCRS, a product from Oracle to install over the clusterware software. This product make Active/Passive configurations "easy". No need for "hand made" scripts, no need to create profiles to register resources, and a "simple" command line to control all. The problem is ASCRS only supports clusterware from releases 10.1, 10.2 and 11.1 not 11.2 (at the moment). Note: The ASCRS comes with the Fusion Middleware Companion CD. Regards, Rafael.

Posted by Rafael on January 14, 2011 at 01:43 AM CET #

Humm, that sounds interesting, indeed. I will surely have a look at this. May I ask if this tool needs licencing ? I am wondering why it wouldn't work with 11.2 ? May be some want us to use RacOneNode instead. Possible. Anyway, thank you for this info. Gilles

Posted by Gilles Haro on January 14, 2011 at 01:49 AM CET #

This article was really nicely published, and it also incorporates pretty several practical facts. I appreciated your expert method of writing this write-up. You could have developed it simple for me to totally grasp.

Posted by Ismael Shutte on January 20, 2011 at 02:04 AM CET #

Thank you for encouraging me ! Gilles

Posted by Gilles Haro on January 20, 2011 at 02:12 AM CET #

Hello Gilles Thanks a lot for this very good demo. But, just to be sure.... can realize this Active/Passive Cluster with the following : * Grid Infrastructure on all nodes * Oracle Database binaries (RAC mode) (it is mandatory to have Standalone Database binairies installed on both nodes ?) Thanks Regards Marcus

Posted by Marcus on February 05, 2011 at 03:23 AM CET #

Hello Marcus, I can see no reason why it should not work with RAC binaries. But if you have licences for RAC, you should consider Rac One Node instead. You can have multiple nodes but only define the nodes for the Active/Passive configuration in the OCR. Gilles

Posted by Gilles Haro on February 05, 2011 at 07:46 AM CET #

Thanks Gilles. This is a very good example for 11.2.

Posted by Mohan Poojari on April 01, 2011 at 02:09 AM CEST #

Hi, Using Oracle 11.2. My architecture is 2-Node RAC on one SAN box and want to set up 1-Node passive at secondary site connected to another SAN box. Can I use your method or need to have a 3rd party tool to do the failover to secondary site.? If I go for a manual failover what are the problems?

Posted by Sidhant on May 08, 2011 at 11:34 AM CEST #

Hello Gilles , Thank you for this post very useable. Pierre

Posted by Pierre on May 11, 2011 at 11:34 PM CEST #

Hi Gilles,

Thanks for your post. I originally had active/passive failover working with 11.1.0.7 based on oracle's whitepaper.. which used an application VIP to control the failover but I've found since I tried this approach using 11.2.0.2 this does not work very well. If there are more than 3 databases that need to failover then the server pretty much stalls and the failover does not work. Strangely 3 databases failover without problem. Anyway looking at your example, I guess the sensible thing would be to use SCAN to control the failover as currently I have SCAN disabled in my setup.

Sorry if this is a silly question. How do you have SCAN setup in your configuration? Do you have 3 seperate SCAN addresses for the 2 nodes in your setup?

many thanks for your time
Dav

Posted by Dav on June 15, 2011 at 03:14 PM CEST #

@Dav,
Hi Dav,
Sorry for the delay.
This is not a silly question.
The Scan is only the gateway the Client connect to the Server, hiding the complexity of the architecture.
Oracle recommends to have at least 3 differents IP that resolve to the SAME name. But, actually, it does not matter how many Scan address you have. This is only for HA/Scalability purposes.
For testing purposes, you can simply use 1 SCAN IP.
Gilles

Posted by guest on July 15, 2011 at 09:39 AM CEST #

@Marcus
Hi,
Standalone binaries are mandatory if you don't want to pay licence for RAC (in a Enterprise Edition licence);
But there is no technical issue for not using Cluster binaries.

Posted by Gilles on July 15, 2011 at 09:41 AM CEST #

@Sidhant,
Hi,
if you have a RAC licence, simply use the services to avoid connection to the "passive" node and keep the RAC features.
This "solution" is only for those who don't need RAC but still want to benefit from GI protection features.

Posted by guest on July 15, 2011 at 09:44 AM CEST #

Hi Gilles,
I configured this solution before reading your blog (thanks, very interesting tips), using SCAN as an application VIP. CT has no possibility to configure DNS neither DHCP, so my only SCAN is set in /etc/hosts and failovered through Sun Cluster. During tests I had to implement hard start/stop dependency with SCAN listener. If you unplug public interface, the db remains isolated on the node and doesn't relocate.

Posted by Agn on July 18, 2011 at 04:42 AM CEST #

Hi Gilles,

To achieve this failover capability, do we need to install 11GR2 grid infrastructure without interconnect or with interconnect. My understanding was we install Grid Infrastructure without interconnect and then install standalone database software, but everytime i tried to install Grid Infrastructure, installation does not allow me to skip the interconnect IP address.
Please can you clarify the Grid infrastructure installation steps.

Thanks in advance.
Fawad.

Posted by guest on July 18, 2011 at 10:27 AM CEST #

@Fawad,
I am afraid Fawad but you misunderstood the goal of Grid Infrastructure.
It is Ze Clusterware !
Therefore, it has to allow the cluster nodes to communicate each other and this is achieved through the Interconnect.
The difference with RAC is that it will mainly be used for "heatbeat" features, there won't be any block exchanges between nodes. That means it won't require a 10Gb or 1Gb bandwidth. But you will need an interconnect. Ever with GI.

Hope this helps.
Gilles

Posted by Gilles on July 18, 2011 at 11:55 AM CEST #

Excellent article.
I was able to setup 11.2.0.2 grid with 11.2.0.1 SE and achieve a cold failover of the database, following the steps you mentioned in your blog.
Just had to change the script a little to setup local listeners whenever a failover occured, to be able to access the DB using the SCAN. got the hostname and defined the local_listener string based on that.

How did you achieve this?
Did you use a pfile instead of the spfile?

Thanks and Regards
Upendra.

Posted by upendra satluri on August 04, 2011 at 09:06 AM CEST #

Thank you for your comment, Upendra.

You are true, I did not tell much about client connection in case of failover. Surely, the connection gets broken, and the client simply has to reconnect.

Again, you are right when you say that, as the database moved from one node to another node, the SCAN is the only way to make it seamlessly.

You simply went wrong in setting up the listener:
Do not touch the LOCAL_LISTENER, but set the REMOTE_LISTENER to talk to the SCAN on its defined port. Things should go better.
Also, do not use tns aliases, but the raw definition of the listener.

I do use a spfile. pfiles belongs to the past. Do not use them if you can.

Please let me know if this update helped you. I will then update the blog entry.

Regards,
Gilles

Posted by Gilles on August 04, 2011 at 09:36 AM CEST #

Hi Gilles,
Thanks for sharing the info..
Can you pls let me know how to do the same for appvip & listener as what Oracle provided steps for 11gr1 white paper. I have followed the Oracle 11gr1 whitepaper and got stuck with the dependencies hierarchy.Below is the workflow which i followed but the things are not working. Pls assist if I am in the right path.

Step1 rg1
#########################################################################################################################

crsctl add resource rg1 -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/public/act_resgroup.pl, \
CHECK_INTERVAL=600"

./crsctl start resource rg1

Step2 rg1.vip
#########################################################################################################################

As root

$CRS_HOME/bin/appvipcfg create -network=1 -ip=xx.xx.xx.xx -vipname=rg1.vip -user=root -group=root

./crsctl setperm resource rg1.vip -u user:oracle:r-x

./crsctl start resource rg1.vip

Step3 rg1.listener
#########################################################################################################################

./crsctl add resource rg1.listener -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/public/act_listener.pl, \
CHECK_INTERVAL=20,RESTART_ATTEMPTS=5, \
START_DEPENDENCIES=hard(rg1.vip) pullup(rg1.vip),STOP_DEPENDENCIES=hard(rg1.vip)"

./crsctl start resource rg1.listener

Step4 rg1.db_rg1
#########################################################################################################################

./crsctl add resource rg1.db_rg1 -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/public/act_db.pl, CHECK_INTERVAL=20,RESTART_ATTEMPTS=5, \
START_DEPENDENCIES=hard(rg1.listener) pullup(rg1),STOP_DEPENDENCIES=hard(rg1.listener)"

./crsctl start resource rg1.db_rg1

Step5 rg1.head
#########################################################################################################################

./crsctl add resource rg1.head -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/public/act_resgroup.pl, CHECK_INTERVAL=600,RESTART_ATTEMPTS=2, \
START_DEPENDENCIES=hard(rg1.listener,rg1.db_rg1,rg1)"

./crsctl start resource rg1.head

Thx

Posted by guest on August 18, 2011 at 08:34 AM CEST #

@Krish

11g Release 2 brings SCAN feature that help with listener and VIP. You no longer need to failover listener and VIP, as the SCAN does handle it for you by providing a single set of IP (resolved as ONE single dns name).

Then, no matter where your database is located, you access it the same way, through the SCAN IP address.

Gilles

Posted by Gilles on August 19, 2011 at 05:13 AM CEST #

hi Gilles,

Thank you for the great article.

In my test environment, relocation of resource failed after 7-8 shutdown/relocation .It says relocation failed. there is no erros generated in alert/tracefiles. Only relocation failing. Any idea on this ?

Posted by guest on September 13, 2011 at 11:49 AM CEST #

Hi Gilles,

You said Scan listener work for you. It 's true if you relocate your Db from one node to another. ( crash case/ manual relocate case)

But if your vip or listener are down, the DB was not relocate, they are no depencies with listener or VIP.
so it is impossible to connect on DB.

Pierre

Posted by Pierre on September 23, 2011 at 11:29 AM CEST #

Gilles,

Thank you very much for a great article and explaining the things with example scripts.

To one of the posts, you replied "Do not use tns aliases, but the raw definition of the listener". Can you please explain what you meant by "raw definition of the listener"?

One thing that I noticed is - once the database instance is relocated to other node, without doing ${GRID_HOME}/bin/lsnrctl stop and ${GRID_HOME}/bin/lsnrctl start, clients are not able to connect to the database even when they are using HOST=<SCAN_NAME> in their tnsnames.ora stub.

This is not a big thing as clients will see a disruption in their connection anyway and we can incorporate lsnrctl stop/start as part of the action script. In my view, this small point should _not_ justify for going towards RAC or RAC One Node and spend more money in licensing. In the case of RAC, if TAF (Transparent Application Failover) smoothly transitions all the sessions that are in the middle of transactions to other surviving nodes, then RAC is a true HA solution. But, based on my understanding, TAF can only transition SELECTs. If that is the case, what happens to the sessions that are in the middle of DML?

For these reasons, I like this solution as it provides HA in the case of a node failure. Also, it doesn't cost extra money and we don't have to pay the penalty for Global Cache latency issues as we are not going horizontal by running database simultaneously from multiple nodes.

Going horizontal might be beneficial in share nothing architecture. But, I wonder how one can get more throughput in share everything architecture by going horizontal.

Thanks again for a great article and sharing your knowledge with us.

Moorthy.

Posted by Moorthy Rekapalli on September 23, 2011 at 08:08 PM CEST #

Hi Gilles,

I need to implement the same active-passive cluster on RHEL5, with oracle 11g R2. I have some question regarding this.

1) We are using IBM SAN storage. So in this configuration the shared storage would be mounted always on both node, or just on active node ?

2) How many SCAN VIP should I configure ?

3) Please confirm, while installing grid infrastructure I need to select option "Install grid infra for a cluster" and while intalling database, I need to select option "Select single instance database installation" ?

Thanks in advance.....

Posted by Pankaj Sharma on October 19, 2011 at 04:18 AM CEST #

@Moorthy.
Thanks for interest regarding this entry.
1. You need a shared storage, accessible from both node. Idealy, ASM on your SAN.
2. Oracle recommends 3 SCAN IPs (it is not a VIP). This, for scalability reasons. For testing purposes, one fits well. You can extend it to 3 later on.
3. Exactly !

Gilles

Posted by guest on October 19, 2011 at 05:23 AM CEST #

Hi Gilles,

Thanks for your quick response.
As per your suggestion in previous comment, I have implemented this solution in my test environment and every thing is working perfectly. Below is summary about test setup:

1) We have license for Oracle Database 11g R2 Enterprise Edition.
2) We have IBM SAN storage, accessible through ASM on both nodes.
3) As per your comment, while installing grid infrastructure I selected option "Install grid infra for a cluster" and while installing database, I selected option "Select single instance database installation".

Although you have describe in your first comment that this solution is not required any extra license.

But just to make sure to my upper management, I want to ask that the Grid Infrastructure and ASM, both are included in 11g R2 Enterprise Edition license or Not ?

One more thing want to know that what are the other options to use instead of ASM ? Does they required any license ?

Thanks in advance.

Posted by Pankaj Sharma on October 21, 2011 at 06:15 AM CEST #

Hi,
1- Good thing ;-)
2- An Oracle storage would have been a better choice ;-)
3- I confirm. As long as RAC or RACOne is not used (Single instance), you don't have to reopen your purse. ASM and Grid Infra are part of the Oracle Licence (SE or EE).
4- Instead of ASM, any type of cluster filesystem might do the job. But check Oracle validated solution. Example : OCFS, NFS, etc.. might work. But ASM is definitely the right choice.

Gilles

Posted by Gilles on October 21, 2011 at 06:19 AM CEST #

Hi Gilles,

Thanks a lot for your reply.

I gave the demo for this solution to my upper management on the test environment and finally I got the sign off to implement this on production.

Thanks again...

Posted by Pankaj Sharma on October 21, 2011 at 09:18 AM CEST #

Gilles,

Although a newbie in Oracle Rac and clusterware, I would like to ask a question concerning your implementation. I was asked to find and provide similar solution to my project manager. Is that solution possible on Windows too, without using the Oracle Fail/Safe software and using only the Grid infrastructure 11g R2 software. We will not have to pay for licensing right just for the database, right??

Thank you very much for your answer

Posted by guest on November 24, 2011 at 06:55 AM CET #

Hi Gilles,

i have configured a single instance database using your post. But the DB is not accessible over SCAN ip. please let me know if anything additional needs to be done.

regards

prakash

Posted by guest on December 06, 2011 at 01:51 AM CET #

Hi,
I'm little screwd here. I have a 2 node oracle RAC cluster. I tried the above steps and its throwing error for me. when i run
./crsctl add resource orcl.db -type cluster_resource -file /u01/app/oracle/product/11.2.0/dbhome_1/crs/11.2.0/HA_scripts/myNewresource.txt

It says "
CRS-2510: Resource 'ora.DTA.dg' used in dependency 'hard' does not exist or is not registered
CRS-2514: Dependency attribute specification 'hard' is invalid in resource 'orcl.db'
CRS-4000: Command Add failed, or completed with errors.

Need help!

Posted by guest on January 12, 2012 at 06:28 AM CET #

Hi karanth.megha,

ora.DTA.dg and ora.FRA.dg in the example are the names of the ASM diskgroups (DTA and Flash Recovery Area) the database uses.
You have to adapt these names with your system.

Gilles

Posted by Gilles on January 12, 2012 at 07:54 AM CET #

Hello Gilles,

This is really an excellent demo. I do have a question about the active / passive database state itself. If for some reason a dba goes on to the passive node and tries to bring up the database on that node what will happen to the primary database that is running on the active node. Will it fail to bring the db or does it disconnect from the primary node and bring it up on the passive node. Just want to make sure what the Consequences are by accidentally bringing the database up on the passive node. Thanks

Posted by Raghu on February 02, 2012 at 08:02 PM CET #

Hi Gilles,

thanks for this excellent guide.

All works perfect, database relocate works perfectly.
But DB is not accessible through SCAN ip, while works if I point public/vip ip server (node1 or node2).
My client reports:
"Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor"

There's any setting that I forgot?

Thank you

BR

Posted by guest on February 16, 2012 at 05:18 PM CET #

Hi Gilles,

your article is absolutely excellent! I setup 2 cluster HA that work perfectly!

My tips:
- needs to sync
$ORACLE_BASE/admin between nodes;
$ORACLE_HOME/dbs between nodes;
$ORACLE_HOME/network/admin/tnsnames.ora (edit tnsnames.ora and update hostname)
- needs to register remote_listener (ex. alter system set remote_listener='mydatabasename.domain.tld:1521' scope=both sid='*';

Clients as SQLDeveloper or Oracle SQL Developer must resolve domain and hostnames

THANKS!

Posted by guest on March 07, 2012 at 12:17 PM CET #

Hi Gilles

This is a very useful solution for the active/passive mode but I need more clarification regarding the below
1. Should I install the grid infrastructure on both nodes?
2 .What should I update in the above script if I want to adopt a manual failover?

Regards,
Antoine

Posted by Antoine Nakouzi on March 07, 2012 at 04:13 PM CET #

Hi Antoine,
1. Of course, GI needs to be installed for a CLUSTER.
It's an option in the Installer. You run the installer from One node, choose all the nodes you decided to install GI and let the installer copy the binaries on all the nodes.

2. Can you be clearer ?

Rgds
Gilles

Posted by Gilles on March 07, 2012 at 04:20 PM CET #

Hi Gilles,

Thanks for the steps in details, they are very helpful. However, I have another situation here.

My company is planning to implement ASM and I am tasked to evaluate the possibility to integrate ASM with our current infrastructure system.

We are currently running Oracle 11.2.0.2 on Linux SLES 11 clustered with Pacemaker. The cool feature of Pacemaker is that it is able to failover one database although there are multiple databases on the same server. In other words, failure of 1 database will not affect the other databases residing on the same server. We can easily failover only 1 database.

My question is: Is it possible for a single ASM-based database to failover to the other node without having to cause other databases on the same server to failover as well? Clusterware does not allow this as stated below:

********************************************************************************************************************************************
Extracted from: http://docs.oracle.com/cd/E11882_01/server.112/e18951/asmcon.htm

Oracle ASM instances can be clustered using Oracle Clusterware; there is one Oracle ASM instance for each cluster node. If there are several database
instances for different databases on the same node, then the database instances share the same single Oracle ASM instance on that node.
If the Oracle ASM instance on a node fails, then all of the database instances on that node also fail. Unlike a file system driver failure, an Oracle ASM instance
failure does not require restarting the operating system. In an Oracle RAC environment, the Oracle ASM and database instances on the surviving nodes
automatically recover from an Oracle ASM instance failure on a node.
********************************************************************************************************************************************

Since Clusterware does not support single-database failover, I am actually thinking of doing the following:

1. Setup OS-clustered servers (2 servers, let's name them ASMSVR1 & ASMSVR2) using Pacemaker;
2. Install Grid Infrastructure on ASMSVR1 & ASMSVR2 with 'Standalone Server' option (There are two options, namely 'for a Cluster' and 'for a Standalone Server'). Two independent ASM instances will then be created in the two servers. Let the ASM instances on ASMSVR1 & ASMSVR2 always be up;
3. On ASMSVR1, create a new database 'ASMDB1'.

Now how do I register ASMDB1 in ASMSVR2? Do I need to create a new resource as stated in your blog here? Or can I simply failover ASMDB1 from ASMSVR1 to ASMSVR2 using Pacemaker?

Sorry for the long email and thanks in advance for your kind reply.

Posted by Richard on July 11, 2012 at 11:04 AM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

This blog presents my personal thoughts and findings around the Oracle Database. These entries present the outcomes from experiences and testing regarding various technological aspects of the Oracle database. Opinions are my own and not necessarily those of Oracle Corporation.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today