« July 2009 | Main | October 2009 »

September 2009 Archives

September 4, 2009

Para-virtualizing disk and network drivers for Linux HVM Guests

Lately one of Oracle's customers decided to run their RedHat Enterprise Linux 3 systems onto Oracle VM. Running hardware virtualized is very straightforward and easy to do. However, you may want to improve performance by using para-virtualized drivers. On the topic on para-virtualized drivers, this entry will describe how do do it, what's possible, what's not and where to pay attention to.

To begin a short description: the picture below shows us Oracle VM can run both hardware and para-virtualized guests. Hardware virtualization uses device emulation. In general people get more performance using para-virtualization, as the quests VMs can use these as native drivers.

pvm-hvm.jpg

First one should verify if Oracle VM supports the platform. So, in this situation the Oracle VM online documentation says RedHat Enterprise Linux 3 is certified for hardware virtualization and can even run with para-virtualized drivers.
In order to use para-virtualized drivers the requirement is that at least RedHat Enterprise Linux 3 update 9 is used.

Having para-virtualized drivers in place makes the fully-virtualized guests VM 'para-virtualization-aware'. The benefit of this is the significant I/O performance improvement say 'speed' of the vm.

Please note that having para-virtualized drivers != running a para-virtualized VM. In this situation the only improvement you will see is on block (disk) and network performance.

In order to start using the paravirtualized drivers, you should download them first from the Oracle Unbreakable Linux Network (ULN). The file you need is kmod-xenpv-*el3.i686.rpm and you can find it in the appropriate channel. The file needs to be installed into each RHEL 3 HVM guest. (rpm -ivh kmod-xenpv-smp-0.1-9.el3.i686.rpm)

After this make sure the modules just installed with rpm are copied to /lib/modules/'uname -r'/extra/xenpv. After this the modules need to be loaded:

[root@gridnode01 /]# insmod xen-platform-pci.o
[root@gridnode01 /]# insmod xen-balloon.o
[root@gridnode01 /]# insmod xen-vbd.o
[root@gridnode01 /]# insmod xen-vnif.o

Also make sure the eth0 alias is setup in /etc/modules.conf

alias eth0 xen-vnif

So, for example, your vm.cfg for the HVM guest may look like this:

acpi = 1
apic = 1
boot = 'c'
bootloader = '/usr/bin/pygrub'
builder = 'hvm'
device_model = '/usr/lib/xen/bin/qemu-dm'
disk = ['file:/OVS/running_pool/112_pdrtest/system.img,hda,w',
'file:/OVS/running_pool/112_pdrtest/oh0.img,hdb,w',
]
disk_other_config = []
kernel = '/usr/lib/xen/boot/hvmloader'
keymap = 'en-us'
maxmem = 1024
memory = 1024
name = '12_pdrtest'
on_crash = 'restart'
on_reboot = 'restart'
pae = 1
serial = 'pty'
uuid = 'f2e29b25-391f-74e8-b0cf-cbc69bb7905'
vcpus = 1
vif = ['type=ioemu, mac=00:16:4E:1C:42:7E, bridge=xenbr1']
vnc = 1
vncconsole = 1
vnclisten = '0.0.0.0'
vncpasswd = oracle
vncunused = 1

What now needs to change is the entry "type=ioemu" from the vif configuration line.
The other item to change is for the IO. New physical devices need to be added to the vm.cfg file and one has to make sure they use the xen-vdb disk driver. Please note that once the new disk is created the data needs to be moved to that disk.

Now disk and network drivers are para-virtualized performance will be better.

The ability to keep running older operating systems in a virtualized environment is great.
Oracle Certifies most common operating systems on Oracle VM.
The Oracle software, for instance Real Application Clusters still needs to be certified with Oracle VM and the guest VM OS image. See also this link for best practices and certification.

Rene Kundersma
Oracle Technology Services, The Netherlands

September 12, 2009

Mounting Oracle VM Templates without LVMs in it.

In blog entry "Provisioning your GRID with Oracle VM Templates" I explained how to mount a filesystem from an Oracle VM template that was build on a logical volume. It seems however, that Oracle VM templates are also shipped without logical volumes in it, just plain ext3.

So how would one manage to mount that then ?

First, setup the loop, to see what's in it.

[root@pts05 ] losetup /dev/loop99  /OVS/running_pool/gridnode01/System.img
[root@pts05 ]# fdisk -lu /dev/loop99
Disk /dev/loop99: 6530 MB, 6530871808 bytes
255 heads, 63 sectors/track, 793 cylinders, total 12755609 sectors
Units = sectors of 1 * 512 = 512 bytes
       Device Boot      Start         End      Blocks   Id  System
/dev/loop99p1   *          63       64259       32098+  83  Linux
/dev/loop99p2           64260     8562644     4249192+  83  Linux
/dev/loop99p3         8562645    12739544     2088450   82  Linux swap / Solaris

So, three plain partitions, no logical volumes in it.

What to do next ? it seems NOT possible to inform the kernel about that three partitions:

[root@pts05 p]# partprobe -s /dev/loop99
Error: Error informing the kernel about modifications to partition /dev/loop99p1 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Error informing the kernel about modifications to partition /dev/loop99p2 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Error informing the kernel about modifications to partition /dev/loop99p3 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p3 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Warning: The kernel was unable to re-read the partition table on /dev/loop99 (Invalid argument).  This means Linux won't know anything about the modifications you made until you reboot.  You should reboot your computer before doing anything with /dev/loop99.

Okay, since I don't want to reboot, this is not an option.Maybe we need to setup the loop, but with an offset to the partition I need.

Since the sector size is 512 bytes and I start at 64260, my offset will be: 512 * 64260 = 32901120. Why do I start at 64260, since I guess partition one is the /boot partition.

[root@pts05 tools]# losetup /dev/loop98  /OVS/running_pool/gridnode01/System.img -o 32901120
[root@pts05 tools]# losetup -a | grep offset
/dev/loop98: [0811]:6127628 (/OVS/running_pool/gridnode01/System.img), offset 32901120

Yes, there it is ! Mounting is easy now !

[root@pts05 p]# mkdir /tmp/p; mount /dev/loop98 /tmp/p/
[root@pts05 p]# df -m | loop98
/dev/loop98               4013      2071      1901  53% /tmp/p
[root@pts05 p]# ls -l /tmp/p/
total 196
drwxr-xr-x  2 root root  4096 Sep  3 00:03 bin
drwxr-xr-x  2 root root  4096 Mar 25 05:44 boot
drwxr-xr-x  2 root root  4096 Sep  4 09:58 crs
drwxr-xr-x  2 root root  4096 Apr 10 07:14 dev
drwxr-xr-x 96 root root 12288 Sep 10 07:03 etc
drwxr-xr-x  4 root root  4096 Sep  8 21:30 home
drwxr-xr-x 13 root root  4096 Sep  3 00:03 lib
drwx------  2 root root 16384 Mar 25 05:44 lost+found
drwxr-xr-x  2 root root  4096 Jan  9  2009 media
drwxr-xr-x  2 root root  4096 Jan 21  2009 misc
drwxr-xr-x  3 root root  4096 Sep  4 09:40 mnt
dr-xr-xr-x  2 root root  4096 Sep  8 21:22 net
drwxr-xr-x  4 root root  4096 Sep  9 22:03 opt
drwxr-xr-x  2 root root  4096 Mar 25 05:44 proc
drwxr-x---  2 root root  4096 Sep  8 22:30 root
drwxr-xr-x  2 root root 12288 Sep  9 22:05 sbin
drwxr-xr-x  2 root root  4096 Mar 25 05:44 selinux
drwxr-xr-x  2 root root  4096 Jan  9  2009 srv
drwxr-xr-x  2 root root  4096 Mar 25 05:44 sys
drwxr-xr-x  3 root root  4096 Apr 10 07:03 tftpboot
drwxrwxrwt 17 root root  4096 Sep 10 14:57 tmp
drwxr-xr-x  2 root root  4096 Sep  2 22:13 u01
drwxr-xr-x 14 root root  4096 Apr 10 07:03 usr
drwxr-xr-x 21 root root  4096 Apr 10 07:05 var

Rene Kundersma
Oracle Technology Services, The Netherlands

September 18, 2009

Cold failover for a single instance RAC database

This blog posting is about protecting an instance from a 10.2/11.1 single instance RAC database so that it can act in a cold-failover situation. I want to refer to the great document "Using Oracle Clusterware to Protect a single instance oracle database 11g" written by my collegue Philip Newlan since most input comes from here.

The mentioned pdf describes how to make sure a single instance database can failover with the use of Oracle Clusterware.

With this posting I want to show how to do this for a single instance RAC database where you have to manage instance1' and 'instance2' instead of just one instance.

Since the grid control agent may have some problems with an instance[number] travelling from node1 to node2 this choice was made on purpose for some customer. Other reason for this awkward solution is that the application for some reason cannot run with two instances concurrently.

Also, the fact that instance deployment with sequential instance numbers is standard in their grid environment the choice has been made to do this with different instance numbers instead of one.

Within this posting I will also show the required updates in tnsnames and spfile.

First, I made sure the OCR entries of the RAC database, the Services and the instances are removed from the OCR. Then a "resource group" will be created. This is the container for all the resources.

oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_profile -create \
oss.xdbprk.rg -t application -a \
/ora/product/11.1.0/crs/crs/public/act_resgroup.pl -o "ci=600" 
oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_register oss.xdbprk.rg

Now, let's verify the new entry:

oracle@pts0138([crs]):/ora/product/11.1.0/crs/crs/public> crsstat | grep rg
HA Resource                                   Target     State
oss.xdbprk.rg                                   OFFLINE    OFFLINE

The scripts mentioned in the pdf are placed in $CRS_HOME/crs/public on both nodes. I made sure the scripts are executable and tested them:

export CLUSTERWARE_HOME=/ora/product/11.1.0/crs/
export ORACLE_HOME=/ora/product/10.2.0/db_2
export _USR_ORA_LANG=$ORACLE_HOME 
export _USR_ORA_SRV=xdbprk2 
export _USR_ORA_FLAGS=1 
oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl start
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:21 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
SQL> Connected to an idle instance.
SQL> ORACLE instance started.
Total System Global Area  536870912 bytes
Fixed Size                  2085360 bytes
Variable Size             150998544 bytes
Database Buffers          377487360 bytes
Redo Buffers                6299648 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options

Also tested the stop function.

oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl stop
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:37 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
SQL> Connected.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
oracle@pts0138(xdbprk2):/ora/product/11.1.0/crs/crs/public>

The action above was executed on both nodes. Just to verify if the instance could be started/stoped with the scripts. Only problem was: for each node, the ORACLE_SID had to be changed, as both nodes have another ORACLE_SID for that database. With only one SID you would not have the problem.

Then, the failover resource was created and registered.

oracle@pts0138([crs]):/tmp> crs_profile -create oss.xdbprk.db-cold-failover \
-t application -r oss.xdbprk.rg -a
 /ora/product/11.1.0/crs/crs/public/act_db.pl 
-o "ci=20,ra=5,osrv=xdbprk,ol=/ora/product/10.2.0/db_2,oflags=1,rt=600"
oracle@pts0138([crs]):/tmp> crs_register oss.xdbprk.db-cold-failover

The value osrv=xdbprk will never work as this is not the correct instance name for any of the nodes. Even if I made the value osrv=xdbprk1, then the script would only work on one of the nodes i.e. the node that had the appropriate init.ora etc.

So, leaving the value osrv=xdbprk to this, I actually hard-coded the ORACLE_SID on both sides of the cluster in the act_db.pl. Since the value is now hard coded, it should work. This clearly limits the option to re-use the script for other database, so I'd better change the name of the script to act_db_xdbprk.pl if I do this for real.

So, how will the resource profiles look now ?

oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.rg
NAME=oss.xdbprk.rg
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_resgroup.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=600
DESCRIPTION=oss.xdbprk.rg
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=
oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print  oss.xdbprk.db-cold-failover
NAME=oss.xdbprk.db-cold-failover
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_db.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=20
DESCRIPTION=oss.xdbprk.db-cold-failover
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=oss.xdbprk.rg
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=1
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=/ora/product/10.2.0/db_2
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=xdbprk
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

Okay, and then the basic test of starting the resource, first all is down:

oracle@pts0101([crs]):/var/opt/oracle> crsstat
HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     OFFLINE    OFFLINE
oss.xdbprk.rg                                   OFFLINE    OFFLINE

Then, the start:

oracle@pts0101([crs]):/var/opt/oracle> crs_start  oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0138([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle   31568     1  0 11:51 ?        00:00:00 ora_smon_xdbprk2

And the relocate:

oracle@pts0101([crs]):/var/opt/oracle> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0138`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0138`
Stop of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0101`
Start of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

Let's verify if it runs on the other node:

oracle@pts0101([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle   11568     1  0 11:55 ?        00:00:00 ora_smon_xdbprk1

And stopped on the original:

oracle@pts0138(xdbprk2):/ora/product/10.2.0/db_2/dbs> ps -ef | grep smon | grep dbprk

Okay, what is left to do from here:

Service names that used to be managed by CRS, now have to be coded hard in the spfile so that they register each time with the listener:

service_names in spfile added:

dbprk.xe.grid
dbprk.xe.supp
dbprk.xe.link

Also, for each node, another local listener will be used. I made sure this is in the spfile:

xdbprk1.local_listener='pts0101-LOCAL-LISTENER'
xdbprk2.local_listener='pts0138-LOCAL-LISTENER'

In order to be sure there can only be started one instance the cluster_database_instances_parameter is set to 1.

*.cluster_database_instances=1

For tnsnames a normail failover entry is created, if the first instance is down, the next will be found (and should be running):

oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> tnsping dbprk.xe.grid
TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 18-SEP-2009 12:39:14
Copyright (c) 1997,  2007, Oracle.  All rights reserved.
Used parameter files:
/etc/oss/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=pts0101-grid.nl.eu.abnamro.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=pts0138-grid.nl.eu.abnamro.com)(PORT=1521))(LOAD_BALANCE=ON))(CONNECT_DATA=(SERVICE_NAME=dbprk.xe.grid)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=20))))
OK (10 msec)

So, another basic test, what is the situation:

HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     ONLINE     ONLINE on pts0101
oss.xdbprk.rg                                   ONLINE     ONLINE on pts0101
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_start  oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

In which instance will my session end ?

oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:39:55 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk1

And after the relocate, the session should go to the other instance:

oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0101`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0101`
Stop of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0138`
Start of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crsstat
HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     ONLINE     ONLINE on pts0138
oss.xdbprk.rg                                   ONLINE     ONLINE on pts0138
oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:41:22 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk2

As an extra test, to make sure the two instances cannot be started concurrently, started with a test to start the second instance after starting the first.
As you can see this is not possible.

oracle@pts0101(*):/var/opt/oracle> db xdbprk1
ORACLE_SID=xdbprk1
ORACLE_HOME=/ora/product/10.2.0/db_2
oracle@pts0101(xdbprk1):/var/opt/oracle> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:42:02 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to an idle instance.
SQL> startup;
ORACLE instance started.
Total System Global Area  536870912 bytes
Fixed Size                  2085360 bytes
Variable Size             331353616 bytes
Database Buffers          197132288 bytes
Redo Buffers                6299648 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

Rene Kundersma
Oracle Technology Services, The Netherlands

September 19, 2009

11gR2 Grid Infrastructure Installation

There is so much to tell about the new features that come with 11gR2, this new release gives me input for years ! Since the "Oracle Database New Features Guide 11g Release 2" does a good job here, I am not even trying to cover some of that.

I will however try to discuss some highlights or cool new things that changed since the previous (11gR1) release. 11gR2 Grid Infrastructure is one of those things.

11gR2 Grid Infrastructure combines Clusterware and ASM in one Oracle home and can be described as the next step in Grid Computing. If you are familiar with previous Clusterware and ASM releases, you will recognize the new functionality and way of working and realize this is indeed the next step in what we need for enabling Enterprise Grid. Deployment is simpler, faster and we are not talking about nodes anymore, but about services that live on resources.

One of the new features of 11gR2 is Grid Plug and Play, also called GPnP. Let me repeat what the documentation says about GPnP:

"Grid Plug and Play (GPnP) eliminates per-node configuration data and the need for explicit add and delete nodes steps. This allows a system administrator to take a template system image and run it on a new node with no further configuration. This removes many manual operations, reduces the opportunity for errors, and encourages configurations that can be changed easily. Removal of the per-node configuration makes the nodes easier to replace, because they do not need to contain individually-managed state.

Grid Plug and Play reduces the cost of installing, configuring, and managing database nodes by making their per-node state disposable. It allows nodes to be easily replaced with regenerated state"

Some of the key enablers for GPnP are GNS and DHCP. GNS, the Grid Naming Service is described here.

Since all of the requirements for a Grid Infrastructure installation are clearly documented in the "Grid Infrastructure installation guide", there is no need to discuss this.

This posting however is made to demo how to do an "Advanced Installation" of the Grid Infrastructure your self and show how to do an installation for education purposes, for example a situation at home where you want to test the setup of Oracle Grid Infrastructure with your own DNS and DHCP server. In real life, at customer sites, DNS and DHCP servers are all in place and Oracle Grid Infrastructure can leverage from these existing services.

Since most steps of the Oracle Grid Infrastructure installation are easy I will only only focus on the details I want to discuss regarding GNS and DHCP.

Oracle Grid Infrastructure can be downloaded here and when you made sure all prerequisites are checked you can start the installation by executing runInstaller.

It does makes sense to install the Oracle Grid Infrastructure with a different user id then the Oracle database. For this the Oracle documentation again has some sound examples. Because of this I had to make sure permissions for directories and for example ASM disks are setup with 'grid' permissions instead of 'oracle' (and both oinstall as group)

I used user "grid" to install the Oracle Grid Infrastructure and since I wanted to install and configure the Oracle Grid Infrastructure I chose the first option.

inst-00154.jpg

A typical installation does not have GNS and since the purpose of the posting is to explain about the setup with GNS, the "Advanced Installation" option was chosen.

inst-00155.jpg

Language, speaks for itself.

inst-00156.jpg

Okay, this basically is the most important step of the setup. At this step you have to define the name for your cluster. In my case "cluster01", that is an easy one as there are no relations for this.

The SCAN name however, is the "Simple Client Access Name" and will be setup by the Oracle Grid Infrastructure. This SCAN name will resolve to three ip addresses within the cluster. The good news is that you don't have to do much for it, just make up a name that your clients will use later to acces databases in your cluster. SCAN Port 1521 is the default port we always use for SQLNet. The SCAN name has to be in the GNS Sub Domain as explained below:

inst-00157.jpg

The option "Configure GNS" was checked. If this box was not checked, still SCAN could be used, but then, I had to setup the SCAN entries in DNS myself, with the SCAN name resolving to three different ip addresses.

However, since GNS is checked, the Grid Naming Service will be configured and GNS will setup my SCAN name. The only requirement is that a GNS Sub domain must be made and the DNS must be configured so that each request for this Sub Domain will be delegated to the GNS Sub Domain, so that GNS can handle the request.

The GNS VIP address is the ip address of the server that will host the GNS. You need to make sure this one is available for use.

You may ask yourself why this all is needed. Well, imagine yourself a cluster, where nodes are added and removed dynamically. In this situation, the complete administration with ip address management and name resolution management is done by the cluster itself. No need to do any manual work in updating connection strings, configuring ip numbers etc.

inst-00158.jpg

So how does it work:

First, my (named, linux) DNS is running on 10.161.102.40.
This DNS does the naming for cluster01.nl.oracle.com and pts.local.
For cluster01.nl.oracle.com a "delegation" is made, so that every request to a machine in the domain .cluster01.nl.oracle.com is delegated to the GNS. (with the GNS VIP).

In DNS:

cluster01.nl.oracle.com NS gns.cluster01.nl.oracle.com
gns.cluster01.nl.oracle.com. 10.161.102.55

So, once the cluster installation is done, the GNS in the cluster will be stared and a request to scan.cluster01.nl.oracle.com will be forwarded to the GNS. The GNS will then take care of the request and answer which three nodes in the cluster will serve as scan listeners:

[root@gridnode01pts05 ~]# nslookup scan.cluster01.nl.oracle.com
Server:         10.161.102.40
Address:        10.161.102.40#53
Non-authoritative answer:
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.78
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.79
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.77

Also, with dig, you can see all information coming from GNS:

[root@dns-dhcp ~]# dig scan.cluster01.nl.oracle.com
; <<>> DiG 9.3.4-P1 <<>> scan.cluster01.nl.oracle.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46016
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 10, ADDITIONAL: 10
;; QUESTION SECTION:
;scan.cluster01.nl.oracle.com.  IN      A
;; ANSWER SECTION:
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.78
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.79
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.77
;; AUTHORITY SECTION:
oracle.com.             10732   IN      NS      dns2.us.oracle.com.
oracle.com.             10732   IN      NS      dns3.us.oracle.com.
oracle.com.             10732   IN      NS      dns4.us.oracle.com.
oracle.com.             10732   IN      NS      dns1-us.us.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster1.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster2.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster3.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster4.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster5.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster6.oracle.com.
;; ADDITIONAL SECTION:
dns2.us.oracle.com.     3984    IN      A       130.35.249.52
dns3.us.oracle.com.     3984    IN      A       144.20.190.70
dns4.us.oracle.com.     3984    IN      A       138.2.202.15
dns1-us.us.oracle.com.  3984    IN      A       130.35.249.41
dnsmaster1.oracle.com.  1060    IN      A       192.135.82.4
dnsmaster2.oracle.com.  1060    IN      A       192.135.82.20
dnsmaster3.oracle.com.  1060    IN      A       192.135.82.36
dnsmaster4.oracle.com.  1060    IN      A       192.135.82.52
dnsmaster5.oracle.com.  1060    IN      A       192.135.82.70
dnsmaster6.oracle.com.  1060    IN      A       192.135.82.84
;; Query time: 0 msec
;; SERVER: 10.161.102.40#53(10.161.102.40)
;; WHEN: Sat Sep 19 17:15:47 2009
;; MSG SIZE  rcvd: 486


Later, when the database is installed, you can use the SCAN with SQLNet EZ connect to connect to the database. Can't wait, I just have to demo it now:

[oracle@gridnode01pts05 ~]$ sqlplus system/oracle@scan.cluster01.nl.oracle.com:1521/dbpts05.pts.local
SQL*Plus: Release 11.2.0.1.0 Production on Sat Sep 19 17:11:32 2009
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> 

After specification of the GNS details the details of the nodes within the cluster have to be entered. The node names you enter here also have to be able to resolve their name. The management of the virtual ip address will be done automatically as long as a working DHCP service is available to serve ip addresses within that network. You can see here that the nodes can be in another domain then the GNS Sub Domain.

inst-00159.jpg

Click here for my dhcp config.

New in 11gR2 is the ability to let the installer configure ssh for you ! Great !

inst-00160.jpg

Even if it says that it will take several minutes, most of the time the ssh setup is done within the minute.

inst-00161.jpg

As in 10g, and 11gR1, this step is used to specify the public and internal interface.
Same as in 10g and 11gR1, the private interface will be used for the interconnect.

inst-00162.jpg

11gR2 has the new option to place the OCR and Voting disks on ASM storage, so that is what I will do.

inst-00163.jpg

After choosing for ASM, the next step is specifying disks that will be used for the ASM diskgroup. This is also kind-of 10g/11gR1, however, at that time this step was in the DBCA.

inst-00164.jpg

Choosing 6 disks of 2GB with external redundancy.

inst-00165.jpg

You see the installer complaining about my not so complicated password that I chose. (since this not for production purposes)

inst-00166.jpg

inst-00167.jpg

Intelligent Planform Management is really cool, look at this. You choose to let the Grid Infrastructure work with it.

inst-00168.jpg

In the real world it does makes sense to separate the three groups !

inst-00169.jpg

inst-00170.jpg

Screen to specify the Oracle base and Software Location for the Grid Infrastructure. Remember, this location has to extist (read and write) on all nodes that you plan to install the software on.

inst-00171.jpg

Location of the inventory

inst-00172.jpg

This is really great. The Prerequisite checker now has the ability to generate a "fix" script.
So some (not all) requirements that are not setup okay can be corrected by running a fix-up script generated by the prerequisite checker.

inst-00173.jpg

In my situation, some kernel parameters needed to be changed. That could be done easily with this tool.

inst-00174.jpg

Run the script on both nodes.

inst-00175.jpg

My swap space seems to be 1KB too small. This is because I used an Oracle VM template (yes, although not officially certified yet I am running virtualized). I think I will manage with 1KB less then recommend.

inst-00176.jpg

Summary:

inst-00177.jpg

Progress:

inst-00179.jpg

Software being transferred to the other node:

inst-00180.jpg

And running the root scripts to setup permissions and configure the cluster.

inst-00181.jpg

Running oraInstRoot.sh on both nodes:

inst-00182.jpg

Running root.sh on node 1, this is where the cluster configuration is done.

inst-00183.jpg

Watch the Voting File on the first ASM disk in diskgroup DATA:

inst-00184.jpg

And root.sh on node 1 finished...

inst-00185.jpg

Don't forget node 2:

inst-00186.jpg

inst-00187.jpg

After running the root scripts, you have to click on okay. After this some small post configuration steps are performed by the installer and clufvy also runs. If this all runs fine, like it did here the screen will quickly continue to the last page telling you the installation was succesful

inst-00188.jpg

In my next postings i will show the installation and creation of a 11gR2 RAC database, and also adding nodes to the cluster and adding instances to the RAC database.

Rene Kundersma
Oracle Technology Services, The Netherlands

September 24, 2009

DBM / Exadata on Planboard Symposium Nov 17th 2009

Dutch Oracle DBAs should not miss the next Planboard Symposium.
Check it out here

For me this is a great opportunity to tell about DBM v1 & v2.

Rene Kundersma
Oracle Technology Services, The Netherlands

About September 2009

This page contains all entries posted to Oracle XPS The Netherlands On HA in September 2009. They are listed from oldest to newest.

July 2009 is the previous archive.

October 2009 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle