Main

GRID Archives

February 1, 2009

Mass deployment for the Oracle Enterprise Manager Agent


In an Oracle Grid environment one of the first things you do once your nodes are stacked with an OS is installing the Oracle Enterprise Manager Grid Control Agent.

Once this Agent is in place you can start deploying the Oracle Stack from Enterprise Manager.

As always, there are multiple ways to do this. In this case I like to detail on a specific, quick and easy way: the Downloadable Agent.

In short this is how it works:

You have an already running EMGC (OMS) in place. This OMS provides an http url that will be used to download the Agent software. This downloading is done by a script called agentDownload.platform (where "platform" is the name of your OS, like linux).

This script takes care of downloading the installations files, a "response" file by wget and automatically (silently) installing the agent.

This all can be done within two minutes !

The only thing one has to do is go to the OMS_HOME/sysman/agent_download directory.
From here copy the script over to your node and run it. A good idea would be to have the script by default on your OS build.

The script needs at least three arguments:
1. the -m for for the location of the OMS (host name or ip-address)
2. the -r for the port number of that OMS address
3. the -b for the Oracle Base.

From this point on, everything is done automatically.

To give you some insight, some formatted output of one session is displayed below:


[oracle@gridnode04 scratch]$ ./agentDownload -m nlgrid02 -r 4889 -b /u01/app
agentDownload invoked on Tue Jan 27 19:45:25 CET 2009 with Arguments "-m nlgrid02 -r 4889 -b /u01/app"
Platform=Linux.i686, OS=linux
GetPlatform:returned=0, and os is set to: linux, platform=Linux.i686
Creating /scratch/agentDownload10.2.0.4.0Oui ...
LogFile for this Download can be found at: "/scratch/agentDownload10.2.0.4.0Oui/agentDownload012709194525.log"
Running on Selected Platform: Linux.i686
Installer location: /scratch/agentDownload10.2.0.4.0Oui
Downloading Agent install response file ...
--19:45:25-- http://nlgrid02:4889/agent_download/10.2.0.4.0/agent_download.rsp
Resolving nlgrid02... 192.168.200.179
Connecting to nlgrid02|192.168.200.179|:4889... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20066 (20K) [text/plain]
Saving to: `agent_download.rsp'

100%[===========================================================================================>] 20,066 --.-K/s in 0s

19:45:25 (126 MB/s) - `agent_download.rsp' saved [20066/20066]

Finished Downloading with Status=0
Downloaded response with status=0

Provide the Agent Registration password so that the Management Agent can communicate with Secure Management Service.
Note: You may proceed with the installation without supplying the password; however, Management Agent can be secured manually after
the installation.
If Oracle Management Service is not secured, agent will not be secured, so continue by pressing Enter Key.

Enter Agent Registration Password:
Downloading Oracle Installer ...
--19:45:30-- http://nlgrid02:4889/agent_download/10.2.0.4.0/linux/oui/oui_linux.jar
Resolving nlgrid02... 192.168.200.179
Connecting to nlgrid02|192.168.200.179|:4889... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44236848 (42M) [application/java-archive]
Saving to: `oui_linux.jar'

100%[===========================================================================================>] 44,236,848 117M/s in 0.4s

19:45:31 (117 MB/s) - `oui_linux.jar' saved [44236848/44236848]

Downloaded Oracle Installer with status=0
Downloading Unzip Utility ...
--19:45:31-- http://nlgrid02:4889/agent_download/10.2.0.4.0/linux/agent/install/unzip
Resolving nlgrid02... 192.168.200.179
Connecting to nlgrid02|192.168.200.179|:4889... connected.
HTTP request sent, awaiting response... 200 OK
Length: 101448 (99K) [text/plain]
Saving to: `unzip'

100%[===========================================================================================>] 101,448 --.-K/s in 0.001s

19:45:31 (127 MB/s) - `unzip' saved [101448/101448]

Adding execute permissions to unzip ...
Downloaded UnzipUtility with status=0
Verifying Installer jar ...
Verified InstallerJar with status=0
Unjarring Oracle Installer ...
Archive: /scratch/agentDownload10.2.0.4.0Oui/oui_linux.jar
creating: Disk1/stage/
creating: Disk1/stage/fastcopy/
inflating: Disk1/stage/fastcopy/setperms1.sh
inflating: Disk1/stage/fastcopy/oracle.swd_Complete_1.xml
inflating: Disk1/stage/fastcopy/oracle.swd_Complete_exp_1.xml

Installation in progress (Tuesday, January 27, 2009 7:45:49 PM CET)
............................................................... 35% Done.
............................................................... 70% Done.
................... 81% Done.
Install successful

Linking in progress (Tuesday, January 27, 2009 7:46:31 PM CET)
Link successful

Setup in progress (Tuesday, January 27, 2009 7:46:47 PM CET)
........... 100% Done.
Setup successful



Of course, there are more options, and more to tell.
For this I like to refer to OTN where excellent documentation is available.

Rene Kundersma
Oracle Expert Services, The Netherlands


March 4, 2009

Need shared storage fast ? use the Linux Target Framework


For all of us that need (shared) (iSCSI) storage for test or education purposes and don't want to install for example OpenFiler (which still is a great solution), there is now the Linux Target Framework (tgt).

In short, tgt consists of a deamon and utilities that allow you to quickly setup (shared) storage.
Tgt can be used for more, however my example is purely focused on setting up shared iSCSI storage.

First, install the tgt software, this is available in Oracle Enterprise Linux 5.

[root@gridnode05 tmp]# rpm -i scsi-target-utils-0.0-0.20070620snap.el5.i386.rpm 


Then, start the tgtd deamon.

[root@gridnode05 tmp]# service tgtd start
Starting SCSI target daemon: [  OK  ]


Export a new iSCSI target

[root@gridnode05 tmp]# tgtadm --lld iscsi --op new \ 
--mode target --tid 2 -T 192.168.200.173:rkvol


Create storage to export from. Let's make it 100MB in size.
This will be the actual storage that the initiator will see.
In normal situation you should use a normal block or a lvm

[root@gridnode05 tmp]# dd if=/dev/zero of=/scratch/rk.vol bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.367602 seconds, 285 MB/s


Add the "storage volume" to the target:


[root@gridnode05 tmp]# tgtadm --lld iscsi --op new --mode logicalunit \
--tid 2 --lun 1 -b /scratch/rk.vol


Allow all initiator clients to use the target:


[root@gridnode05 tmp]# tgtadm --lld iscsi --op bind --mode target --tid 2 -I ALL



On the client install the iSCSI initiator:


[root@gridnode03 ~]# rpm -i /tmp/iscsi-initiator-utils-6.2.0.868-0.7.el5.i386.rpm


After installation, start the service iscsi

[root@gridnode03 ~]# chkconfig iscsi on

[root@gridnode03 ~]# service iscsi start
iscsid is stopped
Turning off network shutdown. Starting iSCSI daemon: [  OK  ]
[  OK  ]
Setting up iSCSI targets: iscsiadm: No records found!
[  OK  ]


Discover the iscsi device:


[root@gridnode03 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.200.175
192.168.200.175:3260,1 192.168.200.173:rkvol


Restart the iscsi service and notice the target coming in:


[root@gridnode03 ~]# service iscsi restart
Stopping iSCSI daemon: /etc/init.d/iscsi: line 33: 29176 Killed /etc/init.d/iscsid stop
iscsid dead but pid file exists
Turning off network shutdown. Starting iSCSI daemon: [ OK ]
[ OK ]
Setting up iSCSI targets: Logging in to [iface: default,
target: 192.168.200.173:rkvol, portal: 192.168.200.175,3260]
Login to [iface: default, target: 192.168.200.173:rkvol,
portal: 192.168.200.175,3260]: successful
[ OK ]



See the block device coming in, in the messages file


[root@gridnode03 ~]# tail -f /var/log/messages
Mar 4 13:24:19 gridnode03 last message repeated 2 times
Mar 4 13:24:19 gridnode03 iscsid: connection1:0 is operational now
Mar 4 13:24:19 gridnode03 kernel: SCSI device sda: 204800 512-byte hdwr sectors (105 MB)
Mar 4 13:24:19 gridnode03 kernel: sda: Write Protect is off
Mar 4 13:24:19 gridnode03 kernel: SCSI device sda: drive cache: write back
Mar 4 13:24:19 gridnode03 kernel: SCSI device sda: 204800 512-byte hdwr sectors (105 MB)
Mar 4 13:24:19 gridnode03 kernel: sda: Write Protect is off
Mar 4 13:24:19 gridnode03 kernel: SCSI device sda: drive cache: write back
Mar 4 13:24:19 gridnode03 kernel: sda: unknown partition table
Mar 4 13:24:19 gridnode03 kernel: sd 0:0:0:1: Attached scsi disk sda



Verify the size of the block device

[root@gridnode03 ~]# fdisk -l /dev/sda

Disk /dev/sda: 104 MB, 104857600 bytes
4 heads, 50 sectors/track, 1024 cylinders
Units = cylinders of 200 * 512 = 102400 bytes

Disk /dev/sda doesn't contain a valid partition table
[root@gridnode03 ~]#

This seems a very neat utility to use in order to obtain shared storage for education, or testing purposes.

Rene Kundersma
Oracle Expert Services, The Netherlands

March 11, 2009

Provisioning your GRID with Oracle VM Templates

Introduction (Chapter 1)

Linux node installation and configuration (virtualized or not) for an Oracle Grid environment can be done on various ways. Of course, one could do this all manually, but for the larger environments this would of course be undo able.

Also, you want to make sure each installation has the same specifications, and you want to be sure human errors that may occur during the installation are brought back to a minimum.

This blog entry will have chapters in which all details of an automated Oracle VM cloning process will be described.

The setup as described below is used to prepare education environments. It will also work for proof of concept envrionments and most parts of it may be even usable in your own Grid deployment strategy.

The setup described allows you to setup an GRID environments that students can use to learn (for instance) how to install RAC, configure DataGuard, work with Enterprise Manager Grid Control. I can also be used to learn students how to work with Swingbench or FCF all within their own infrastructure.

This virtualized solution help to quickly setup, repair, catch-up, restore and adapt the setup. It will save your IT department costs on hardware and storage and it will save you lots of time.

The pictures on this page are best viewed with Firefox.

Bare metal provisioning

Within the Oracle Grid, Oracle Enterprise Manager Grid Control release 10.2.0.4 with kickstart and PXE-boot is used more often these days as a way to do a so called "bare metal" installation of the OS: kix.gif

After this bare metal installation "post configuration scripts" took care of the node specific settings.

Even with the use of Oracle Virtual Machines on top of such a node, the kickstart procedure can still be used; without too much effort a PXE-boot configuration for virtualized guests can be setup.

This way of "bare metal installation" or better "virtual metal installation" by PXE-boot for VM Guests is a nice solution, which I will describe one day. But why would one do a complete installation for each VM while each VM differs only on a couple of configuration files ?

This blog entry explains how to use an Oracle VM template to provision Virtual Guest Operating Systems for in a Grid situation.

For educational purposes, where classes with a lot of students have to work each with their own Grid environment, a procedure is worked out to provision a blade system with operating systems and software, Grid ready, all based on Oracle templates.

As said, more options are possible, this is how my solution works, it may work for you also. 1. An example OS configuration is provided (node specific configuration files). From that template files a VM Guest specific configuration is generated automatically. This configuration describes settings for hostname, ipnumbers, etc.
2. A vm template (image) is provided.

By automating the two steps above, one can easily and quickly setup Virtualized Oracle Linux Nodes, ready for RAC.

The next chapter will be about the configuration templates and the cloning process

The process (Chapter 2)

With this configuration templates as described earlier, "configuration clones" can be made. In this example I am using HP blade technology. On each blade six VMs will be running. For each blade and for each VM running on top of that the configuration files are generated.

It makes sense to define configuration templates. With the use of scripts you could use these templates and generate configuration files for each specific vm.

With a VM template in one hand, and an automatically generated set of configuration files in the other you can quickly build, or rebuild the infrastructure over and over again.

Even if you need to make changes that reflect all vm's, they can be rolled out quite quickly.

As said, this solution is extremely useful for education purposes, or situations where you have to provide lots of VM guests ready to be used instantly. Possible other uses are in proof of concept environments.

In short the work flow of the cloning process looks like the following: 1. A default virtual machine image is copied over
2. Configuration files for the VM are generated, based upon the blade number and vm number and purpose of the VM
3. The VM image is "mounted" and configuration files are overwritten with the generated configuration files. Also binaries (other programs) are put in place
4. The VM image is unmounted and if needed "file based shared storage" is created.
5. The VM boots for the first time, ready to use immediately, totally pre-configured


The concept itself can of course also be used for the Linux provisioning of your virtualized infrastructure as an alternative to bare metal provisioning.

The next chapter will describe the hardware used and the chosen storage solutions for this example.

Hardware used (Chapter 3)

As discussed in the previous chapter, this project is build on HP blade technology.

The solution described is of course independent of the hardware chosen.

However, in order to describe the complete setup this chapter is here to describe the hardware used.

blade01.JPG

This blade enclosure (C3000) has eight blades, each blade has:
- two nics (broadcom)
- two hba's (qlogic)
- 16 GB of RAM
- two quad core Intel Xeon processors


Storage to the blades is made available by NFS and Fiber Channel

The NFS share is used to provide the VM template that will be used as source.

The same NFS share is also available to the VM guests in order to provide the guests the option to install software from a shared location.

The SAN Storage comes from an HP MSA. This MSA devices are used for OCFS2. This is where the VM images files will be placed

Each blade is available by a public network interface.

Also a private network is setup as interconnect network for OCFS2 between the blades.

For each blade the architecture be equal to the diagram below.

blade02.jpg

VM distribution (Chapter 4)

As said in an earlier chapter, each blade has 16GB RAM, so this is enough to run at least 6 VMs of 2GB RAM each.

The purpose is to have:
- 3 vms for Real Application Clusters (RAC) (11.1.0.7 CRS/ASM/RDBMS)
- 1 vm for Dataguard (11.1.0.7 ASM/RDBMS)
- 1 vm to run swingbench and demo applications
- 1 vm to run Enterprise Manager grid Control (EMGC).


This will look this way:
blade03.jpg

As each blade has 146 GB local storage, there is room to have some VM's on local disks. Since, there is no intention to live migrate these nodes they can be put on a non-shared location.

VM number six (EMGC) is too big to fit next to the other VMs on local storage. For reason a shared OCFS mount is made.

Each VM uses the Oracle VM provided location for the VMs (/OVS/running_pool) With symbolic links the storage for the EMGC vm is brought to the OCFS2 shared disk: GRIDNODE09 -> /OVS_shared_large/running_pool/oemgc/nlhpblade07

By default OCFS2 allows four nodes to concurrently mount OCFS2 filesystem. In order to mount the OCFS2 filesystem on all blades concurrently you have to specify the –N X argument with the execution of mkfs where X is the max. number of nodes that will concurrently mount the OCFS filesystem ever.

mkfs.ocfs2 -b 4K -C 32K -N 8 -L ovmsdisk /dev/sdb1


PV Templates (Chapter 5)

Before doing any specific VM changes, first a template is chosen, in this case Oracle Enterprise Linux 5 update 2 (OEL5U2).

This is an Oracle VM template downloaded from OTN.

Our template is a para-virtualized template, based on a 32bit architecture.

To remind you, this is how the para-virtualized architecture looks: blade04.jpg

b.t.w. para-virtualized kernels often work faster then hardware virtualized guests.

Please see this link for more information on hardware v.s. para-virtualized guests

As part of the procedure described, the template will be copied over six times to each blade. In order to use the VMs on a specific blade for a specific purpose configuration files must be made. The next chapter describes how this works.

VM Specific files and clone procedure (Chapter 6)

Each virtualized guest has a small set of configuration files that are specific for that OS. Typically these files exists outside of the guest (vm.cfg) and inside the guest.

Specific files inside the vm:
- /etc/sysconfig/network-scripts/ifcfg-eth*
- /etc/sysconfig/network
- ssh configuration files

Specific files outside the vm:
- vm.cfg

For VMs running on the same blade (and being part of the same 'grid') there are also files in common:
- nsswitch.conf
- resolv.conf
- sudoers
- sysctl.conf
- hosts

The files mentioned above need to be changed. This is because of the fact each machine needs it's own NIC's with specific MAC Addresses and it's own ip-numbers.

Of course, within a grid (on a blade) each VM has to have a unique name.

In order to make sure unique MAC addresses will be generated, one has to setup standards.

For the MAC addresses, the following formula is used: 00:16:3E:XD:0Y:0Z, where:
X: the number of the blade
Y: the number of the VM,
Z: the number of the NIC within that VM.

Host names will be used multiple times (but not within the same grid), the only thing that needs to change are the corresponding ip-numbers, these must be unique across the grids.
For example, the MAC address for the second NIC on the third VM on blade 7 would look like: HWADDR=00:16:3E:7D:03:02
The same strategy is used to determine the ip-numbers to be used:
- For the public network 192.168.200.1XY is used.
- For the internal network 10.0.0.1XY is used
- For the vip 192.168.200.XY is used.

Where:
X: the number of the Blade
Y: the number of the VM

For example:
- the public ip-number of node 3 on blade7 would be: 192.168.200.173
- the private ip-number of node 3 on blade7 would be: 10.0.0.173
- the virtual ip-number of node 3 on blade7 would be:192.168.200.73

So, from here, as long as you know for which blade and for which VM you will be generating the configuration, you can script that:
[root@nlhpblade07 tools]# ./clone_conf.sh nlhpblade01
Copying config files from /OVS_shared_large/conf/nlhpblade07 to /OVS_shared_large/conf/nlhpblade01...
Performing config changes specific to the blade and the VM...
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth0
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth1
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/network
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/vm.cfg
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth0
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth1
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/network
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/vm.cfg
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth0
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth1
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/network
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/vm.cfg
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth0
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth1
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/network
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/vm.cfg
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth0
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth1
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/network
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/vm.cfg
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/ifcfg-eth0
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/network
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/vm.cfg
Performing node common changes for the configuration files...
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/cluster.conf
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/hosts
[root@nlhpblade07 tools]# 
'mounting a vm' (Chapter 7)

Now that we generated the node specific configuration files and copied the basic template we are ready to modify the OS before even booting it. What will happen after 'mounting' the VM image file is that the generated configuration will be copied over into the VM.

As said, at this moment the VM is an image file, for example /OVS/running_pool/GRIDNODE01/system.img. XEN will setup a loop in order to boot the OS from that image.

We do kind of the same in order to change the OS before we boot it:

First, the losetup command is used to associate a loop device with the file. A loop device, is a pseudo-device that makes a file accessible as a block device.
[root@nlhpblade07 GRIDNODE03]#  losetup /dev/loop9 system.img
Now we have mapped the image file to a block device, we want to see the partitions on that. For this we use the command kpartx. Kpartx creates device maps from partitioned tables. Kpart is part of device-mapper multipath
[root@nlhpblade07 GRIDNODE03]# kpartx -a /dev/loop9
So, lets see what partitions device-mapper has for us:
[root@nlhpblade07 GRIDNODE03]# ls /dev/mapper/loop9*
/dev/mapper/loop9p1  /dev/mapper/loop9p2  /dev/mapper/loop9p3
kpartx found three partitions and told DM there are three partitions available. Let's see if we can identify the types:
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p1
/dev/mapper/loop9p1: Linux rev 1.0 ext3 filesystem data
This is probably the /boot partition of the vm.
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p2
/dev/mapper/loop9p2: LVM2 (Linux Logical Volume Manager) , UUID: t2SAm03KoxfUcCOS3OYmsXf9ubqcy9q
This maybe the root or the swap partition
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p3
/dev/mapper/loop9p3: LVM2 (Linux Logical Volume Manager) , UUID: j2U7KUWen1ePjDvm4hTclZvA5YJyvl9
[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2
This may also be the root or the swap partition

So, in order to make a better guess in finding the root partition, let's see what the sizes are:
[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2

Disk /dev/mapper/loop9p2: 13.8 GB, 13851371520 bytes
255 heads, 63 sectors/track, 1684 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/mapper/loop9p2 doesn't contain a valid partition table

[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p3

Disk /dev/mapper/loop9p3: 5362 MB, 5362882560 bytes
255 heads, 63 sectors/track, 652 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/mapper/loop9p3 doesn't contain a valid partition table
As we can see, one partition is 5GB and the other is 13GB. Best guess would be, the 5GB partion is the swap and the 13GB partition the OS.

With the command vgscan we can scan the newly 'discovered' 'disks' and search for volume groups on them:
[root@nlhpblade07 GRIDNODE03]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
vgdisplay says we have one volume group (VolGroup00):
[root@nlhpblade07 GRIDNODE03]# vgdisplay
  --- Volume group ---
  VG Name               VolGroup00
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               17.84 GB
  PE Size               32.00 MB
  Total PE              571
  Alloc PE / Size       571 / 17.84 GB
  Free  PE / Size       0 / 0   
  VG UUID               kmhYBm-Mpbv-usx2-vDur-rEVb-uP4i-kcP4fc
With the command, vgchange -a we can make logical volumes available to use for the kernel.
[root@nlhpblade07 GRIDNODE03]# vgchange -a y VolGroup00
  2 logical volume(s) in volume group "VolGroup00" now active
[root@nlhpblade07 GRIDNODE03]# lvdisplay
lvdisplay can be use to see to see the attributes of a logical volume:
[root@nlhpblade07 GRIDNODE03]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol00
  VG Name                VolGroup00
  LV UUID                B13hk3-f5qY-3gDY-Ackt-13gK-DZDc-cTWx3V
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                14.72 GB
  Current LE             471
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:3
   
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol01
  VG Name                VolGroup00
  LV UUID                iEO4oG-XPMU-syWF-qupo-811i-G6Gg-QZEw5f
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                3.12 GB
  Current LE             100
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:4
So, now we found, (and made available to the logical volume) the root filesystem where the VM is on. Now we can mount that:
[root@nlhpblade07 GRIDNODE03]# mkdir guest_local_LogVol00; 
[root@nlhpblade07 GRIDNODE03]# mount /dev/VolGroup00/LogVol00 guest_local_LogVol00 
See the contents of the filesystem:
[root@nlhpblade07 GRIDNODE03]# cd guest_local_LogVol00/

[root@nlhpblade07 guest_local_LogVol00]# ls -la
total 224
drwxr-xr-x 26 root root  4096 Jan 14  2009 .
drwxr-xr-x  3 root root  4096 Oct 22 22:30 ..
-rw-r--r--  1 root root     0 Jul 24 05:02 .autorelabel
drwxr-xr-x  2 root root  4096 Dec 20  2008 bin
drwxr-xr-x  2 root root  4096 Jun  6 11:26 boot
drwxr-xr-x  4 root root  4096 Jun  6 11:26 dev
drwxr-xr-x 94 root root 12288 Jan 14  2009 etc
drwxr-xr-x  3 root root  4096 Jun  6 11:50 home
drwxr-xr-x 14 root root  4096 Dec 20  2008 lib
drwx------  2 root root 16384 Jun  6 11:26 lost+found
drwxr-xr-x  2 root root  4096 Apr 21  2008 media
drwxr-xr-x  2 root root  4096 May 22 09:51 misc
drwxr-xr-x  3 root root  4096 Dec 20  2008 mnt
dr-xr-xr-x  2 root root  4096 Jun 10 11:11 net
drwxr-xr-x  3 root root  4096 Aug 21 04:11 opt
-rw-r--r--  1 root root     0 Jan 14  2009 poweroff
drwxr-xr-x  2 root root  4096 Jun  6 11:26 proc
drwxr-x--- 17 root root  4096 Jan 13  2009 root
drwxr-xr-x  2 root root 12288 Dec 20  2008 sbin
drwxr-xr-x  4  500  500  4096 Jan 14  2009 scratch
drwxr-xr-x  2 root root  4096 Jun  6 11:26 selinux
drwxr-xr-x  2 root root  4096 Apr 21  2008 srv
drwxr-xr-x  2 root root  4096 Jun  6 11:26 sys
drwxr-xr-x  3 root root  4096 Jun  6 11:33 tftpboot
drwxrwxrwt  9 root root  4096 Jan 14  2009 tmp
drwxr-xr-x  3 root root  4096 Dec 20  2008 u01
drwxr-xr-x 14 root root  4096 Jun  6 11:31 usr
drwxr-xr-x 21 root root  4096 Jun  6 11:37 var
As this seems a rather easy way to mount a vm image file, it is still not something you will do very quickly for 40 VM images very quickly.

For this reason, the described solution is scripted and called mount_vm.sh. This is how it works:
[root@nlhpblade07 GRIDNODE05]# mount_vm.sh GRIDNODE05
Starting mount...

contents /etc/sysconfig/network -file of mounted node:
NETWORKING=yes

NETWORKING_IPV6=no
HOSTNAME=gridnode05.nl.oracle.com

Generating unmount script
To unmount your image run  /tmp/umount_GRIDNODE05.30992.sh as root

Mounting finished...
As you can see the images is mounted my a script and a script to unmount is automatically generated. In order to verify the right image file is mounted the contents of the file /etc/sysconfig/network is shown.

'changing a vm' (Chapter 8)

Now the vm image is mounted to the filesystem, we can go back to the generated config files. From here it is easy to copy over all specific configuration files to the vm. Better, would be to make a script available to do this, and that is done for this solution:
[root@nlhpblade07 GRIDNODE05]# change_vm.sh GRIDNODE05
If you are sure, hit Y or y to continue
Y
Continuing...
Starting config change for VM GRIDNODE05 on nlhpblade07...
Copying swingbench...
Changing ownership of swingbench files...
Copying FCF-Java Demo...
Changing ownership of  FCF-Java Demo files...
This vm requires pre-build file /OVS/sharedDisk/rdbms_home_11r1_01_ocfs.img as shared Oracle RDBMS HOME
Finished changing config...
Now the VM is modified internally and still mounted, the unmount has to be done. This can be done by running the generated unmount script. This script was generated during the mount.
[root@nlhpblade07 GRIDNODE05]# /tmp/umount_GRIDNODE05.30992.sh

Unmount finished
If the unmount succeeded, you can remove this file

rm: remove regular file `/tmp/umount_GRIDNODE05.30992.sh'? y


All Together (Chapter 9)

In essence the procedure described above should be repeated for each VM you want to clone and change on each blade. This may already save you hours of work and reduces chances on mistakes, but still may seem a lot of steps. In order to repeat this for each blade, for each vm, from here it is just a matter of scripting.

So, you could make a script, that for each blade, would do the following:

Pseudo:
for each blade in blade list
do
 Stop all vms first
     while vms still running
     do 
        wait 10 seconds
     done
 Restore all machines (from NFS)
 Clone conf
 Mount and change all vms
 Start all
done 
For all 42 VM images the implemented version of the script above runs about four hours. After this a complete 42 node education environment is setup.
NLHPBLADE%20TA3-1.jpg
Extra Options (Chapter 10)

Besides changing only configuration settings on a VM as mentioned in the Change a VM chapter, other activities can also be done.

For this solution the following options are also implemented.
- configure vnc
- configure ocfs2 within the guest, use a shared Oracle home to save space
- copy and configure software (like an oracle home or swingbench)
- create ASM disk files
- create OCR and Voting disks
- configure sudoers
- provide software for EM Agent Deployment
- configure ssh

Rene Kundersma Oracle Expert Services, The Netherlands

March 24, 2009

'Virtual Metal' Provisioning with Oracle VM and PXE

Basis for Bare Metal Provisioning (BMP) in EMGC 10.2.0.5 is as mentioned in an earlier blog entry "PXE boot".

snap-rac-vm00042.jpg
This blog entry describes how to setup PXE boot (TFTP and DHCP) for a para-virtualised guests.
This allows you to automatically install virtualised guests by kickstart file.

By the way, in this setup I am on OEl 5U2 x86, if you want to reproduce for say x86_64, you may need other packages.

Below are my notes of the setup:
- install dhcp-3.0.5-18.el5
- install tftp-0.42-3.1.0.1 (we need this one later a required package for pypxeboot)
- install tftp-server-0.42-3.1.0.1

After installation of these packages, we begin with the configuration of dhcp in /etc/dhcpd.conf.
As this is just a test I am not using all options for DHCP.
Be care full if you test this, DHCP be working too good...
#
# DHCP Server Configuration file.
#   see /usr/share/doc/dhcp*/dhcpd.conf.sample  
#
ddns-update-style none;
allow booting; 
allow bootp;   

subnet 192.168.200.0 netmask 255.255.255.0 {
    option routers             192.168.200.1;
    option subnet-mask         255.255.255.0;
    option nis-domain          "nl.oracle.com";
    option domain-name         "nl.oracle.com";
    option domain-name-servers 192.135.82.60;

    default-lease-time 60;
    max-lease-time 60;
 
    next-server 192.168.200.173;
    filename "/pxelinux.0";

    host RK{
    hardware ethernet 00:16:3e:62:39:d3;
    fixed-address 192.168.200.177;
    }
}
As you can see I specified subnet, netmask, domain-name and details for the host called "RK".
Details are: name, mac and ip address.

The purpose of the "next-server" is to specify the name (or ip) of the tftp-server.
It makes sense to put DHCP and TFTP server on the same box.

In order to (re)start dhcp:
service dhcpd restart 
After setting up DHCP, TFTP needs to be setup. This is just a matter of enabling the service in inetd.
Set disable = no in the file /etc/xinetd.d/tftp. After this, restart service xinetd.

Pxeboot files need to be copied to /tftpboot on the tftp-server:
cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
cp /usr/lib/syslinux/mboot.c32 /tftpboot/
From your OEL distribution, copy the boot-installation files:
cp $MOUNT_OEL_DISTR/images/xen/* /tftpboot/
Create a PXE configuration file for the guest you want to start:
[root@gridnode03 pxelinux.cfg]# gethostip -x 192.168.200.177
C0A8C8B1
So for a guest with ip-number 192.168.200.177 we need to put the details for the PV-PXE installation into /tftpboot/pxelinux.cfg/C0A8C8B1
[root@gridnode03 ~]# cat /tftpboot/pxelinux.cfg/C0A8C8B1 
default linux
prompt 1
timeout 120
label linux
  kernel vmlinuz
  append initrd=initrd.img lang=en_US keymap=us \
  ks=nfs:192.168.200.200:/vol/vol1/distrib/linux32/workshop-ovs/oel/OEL5U2/ks.cfg \  
  ksdevice=eth0 ip=dhcp
You can see:
- my OEL kickstart-file is on NFS (as my installation)
- the ip number is obtained by ip using eth0

I created my kickstart from an existing OEL installation.
With the help of the command system-config-kickstart --generate I re-generated it.

After this, I had to modify some bits about installation media (from cdrom to nfs).
Specifics for my kickstart file here

See the Redhat site for all options of kickstart.

Before I could start a vm guest I also, had to:
- install pypxeboot and
- install udhcp-0.9.8-1usermac

Then, created a vm configuration file:
[root@nlhpblade07 pxe]# cat rk.cfg 
name = "RK"
memory = "1024"
disk = [ 'file:/OVS/running_pool/pxe/system.img,xvda,w',]
vif = [ 'mac=00:16:3e:62:39:d3,bridge=xenbr0', '', ]
vfb = ["type=vnc,vncunused=1,vnclisten=0.0.0.0"]
#bootloader="/usr/bin/pygrub"
bootloader="/usr/bin/pypxeboot"
bootargs=vif[0]
vcpus=1
on_reboot   = 'restart'
on_crash    = 'restart'
Before I could start the VM, the 'disk' (image) had to be in place:
[root@nlhpblade07 pxe]# dd if=/dev/zero of=system.img bs=1M count=8000
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB) copied, 165.725 seconds, 50.6 MB/s
[root@nlhpblade07 pxe]# 
So, after starting, remember that the third console of the installation enables you to see what is going on during the run of the anaconda installation procedure:
snap-rac-vm00037.jpg
After installation and before the reboot the vm-config file had to be modified and looks like this:
[root@nlhpblade07 pxe]# cat rk.cfg
name = "RK"
memory = "1024"
disk = [ 'file:/OVS/running_pool/pxe/system.img,xvda,w',]
vif = [ 'mac=00:16:3e:62:39:d3,bridge=xenbr0', '', ]
vfb = ["type=vnc,vncunused=1,vnclisten=0.0.0.0"]
bootloader="/usr/bin/pygrub"
vcpus=1
on_reboot   = 'restart'
on_crash    = 'restart'
snap-rac-vm00040.jpg After a successful installation the OS is setup and ready to be used: snap-rac-vm00041.jpg Rene Kundersma
Oracle Expert Services, The Netherlands

April 15, 2009

Oracle VM: 64-bit RAC support

Oracle announced today that 10g 64-bit RAC is now supported on Oracle VM. Details can be found in this paper.

The whitepaper details on VCPU strategies and storage solutions.
The document also describes some best practices you should know using RAC in an Oracle VM environment.

Check it out:

http://www.oracle.com/technology/products/database/clusterware/pdf/oracle_rac_in_oracle_vm_environments.pdf

Rene Kundersma
Oracle Expert Services, The Netherlands

September 12, 2009

Mounting Oracle VM Templates without LVMs in it.

In blog entry "Provisioning your GRID with Oracle VM Templates" I explained how to mount a filesystem from an Oracle VM template that was build on a logical volume. It seems however, that Oracle VM templates are also shipped without logical volumes in it, just plain ext3.

So how would one manage to mount that then ?

First, setup the loop, to see what's in it.

[root@pts05 ] losetup /dev/loop99  /OVS/running_pool/gridnode01/System.img
[root@pts05 ]# fdisk -lu /dev/loop99
Disk /dev/loop99: 6530 MB, 6530871808 bytes
255 heads, 63 sectors/track, 793 cylinders, total 12755609 sectors
Units = sectors of 1 * 512 = 512 bytes
       Device Boot      Start         End      Blocks   Id  System
/dev/loop99p1   *          63       64259       32098+  83  Linux
/dev/loop99p2           64260     8562644     4249192+  83  Linux
/dev/loop99p3         8562645    12739544     2088450   82  Linux swap / Solaris

So, three plain partitions, no logical volumes in it.

What to do next ? it seems NOT possible to inform the kernel about that three partitions:

[root@pts05 p]# partprobe -s /dev/loop99
Error: Error informing the kernel about modifications to partition /dev/loop99p1 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Error informing the kernel about modifications to partition /dev/loop99p2 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Error informing the kernel about modifications to partition /dev/loop99p3 -- Invalid argument.  This means Linux won't know about any changes you made to /dev/loop99p3 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Warning: The kernel was unable to re-read the partition table on /dev/loop99 (Invalid argument).  This means Linux won't know anything about the modifications you made until you reboot.  You should reboot your computer before doing anything with /dev/loop99.

Okay, since I don't want to reboot, this is not an option.Maybe we need to setup the loop, but with an offset to the partition I need.

Since the sector size is 512 bytes and I start at 64260, my offset will be: 512 * 64260 = 32901120. Why do I start at 64260, since I guess partition one is the /boot partition.

[root@pts05 tools]# losetup /dev/loop98  /OVS/running_pool/gridnode01/System.img -o 32901120
[root@pts05 tools]# losetup -a | grep offset
/dev/loop98: [0811]:6127628 (/OVS/running_pool/gridnode01/System.img), offset 32901120

Yes, there it is ! Mounting is easy now !

[root@pts05 p]# mkdir /tmp/p; mount /dev/loop98 /tmp/p/
[root@pts05 p]# df -m | loop98
/dev/loop98               4013      2071      1901  53% /tmp/p
[root@pts05 p]# ls -l /tmp/p/
total 196
drwxr-xr-x  2 root root  4096 Sep  3 00:03 bin
drwxr-xr-x  2 root root  4096 Mar 25 05:44 boot
drwxr-xr-x  2 root root  4096 Sep  4 09:58 crs
drwxr-xr-x  2 root root  4096 Apr 10 07:14 dev
drwxr-xr-x 96 root root 12288 Sep 10 07:03 etc
drwxr-xr-x  4 root root  4096 Sep  8 21:30 home
drwxr-xr-x 13 root root  4096 Sep  3 00:03 lib
drwx------  2 root root 16384 Mar 25 05:44 lost+found
drwxr-xr-x  2 root root  4096 Jan  9  2009 media
drwxr-xr-x  2 root root  4096 Jan 21  2009 misc
drwxr-xr-x  3 root root  4096 Sep  4 09:40 mnt
dr-xr-xr-x  2 root root  4096 Sep  8 21:22 net
drwxr-xr-x  4 root root  4096 Sep  9 22:03 opt
drwxr-xr-x  2 root root  4096 Mar 25 05:44 proc
drwxr-x---  2 root root  4096 Sep  8 22:30 root
drwxr-xr-x  2 root root 12288 Sep  9 22:05 sbin
drwxr-xr-x  2 root root  4096 Mar 25 05:44 selinux
drwxr-xr-x  2 root root  4096 Jan  9  2009 srv
drwxr-xr-x  2 root root  4096 Mar 25 05:44 sys
drwxr-xr-x  3 root root  4096 Apr 10 07:03 tftpboot
drwxrwxrwt 17 root root  4096 Sep 10 14:57 tmp
drwxr-xr-x  2 root root  4096 Sep  2 22:13 u01
drwxr-xr-x 14 root root  4096 Apr 10 07:03 usr
drwxr-xr-x 21 root root  4096 Apr 10 07:05 var

Rene Kundersma
Oracle Technology Services, The Netherlands

September 18, 2009

Cold failover for a single instance RAC database

This blog posting is about protecting an instance from a 10.2/11.1 single instance RAC database so that it can act in a cold-failover situation. I want to refer to the great document "Using Oracle Clusterware to Protect a single instance oracle database 11g" written by my collegue Philip Newlan since most input comes from here.

The mentioned pdf describes how to make sure a single instance database can failover with the use of Oracle Clusterware.

With this posting I want to show how to do this for a single instance RAC database where you have to manage instance1' and 'instance2' instead of just one instance.

Since the grid control agent may have some problems with an instance[number] travelling from node1 to node2 this choice was made on purpose for some customer. Other reason for this awkward solution is that the application for some reason cannot run with two instances concurrently.

Also, the fact that instance deployment with sequential instance numbers is standard in their grid environment the choice has been made to do this with different instance numbers instead of one.

Within this posting I will also show the required updates in tnsnames and spfile.

First, I made sure the OCR entries of the RAC database, the Services and the instances are removed from the OCR. Then a "resource group" will be created. This is the container for all the resources.

oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_profile -create \
oss.xdbprk.rg -t application -a \
/ora/product/11.1.0/crs/crs/public/act_resgroup.pl -o "ci=600" 
oracle@pts0138([crs]):/ora/product/11.1.0/crs> crs_register oss.xdbprk.rg

Now, let's verify the new entry:

oracle@pts0138([crs]):/ora/product/11.1.0/crs/crs/public> crsstat | grep rg
HA Resource                                   Target     State
oss.xdbprk.rg                                   OFFLINE    OFFLINE

The scripts mentioned in the pdf are placed in $CRS_HOME/crs/public on both nodes. I made sure the scripts are executable and tested them:

export CLUSTERWARE_HOME=/ora/product/11.1.0/crs/
export ORACLE_HOME=/ora/product/10.2.0/db_2
export _USR_ORA_LANG=$ORACLE_HOME 
export _USR_ORA_SRV=xdbprk2 
export _USR_ORA_FLAGS=1 
oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl start
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:21 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
SQL> Connected to an idle instance.
SQL> ORACLE instance started.
Total System Global Area  536870912 bytes
Fixed Size                  2085360 bytes
Variable Size             150998544 bytes
Database Buffers          377487360 bytes
Redo Buffers                6299648 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options

Also tested the stop function.

oracle@pts0138(xdbprk2):/tmp> $CLUSTERWARE_HOME/crs/public/act_db.pl stop
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 10:12:37 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
SQL> Connected.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
oracle@pts0138(xdbprk2):/ora/product/11.1.0/crs/crs/public>

The action above was executed on both nodes. Just to verify if the instance could be started/stoped with the scripts. Only problem was: for each node, the ORACLE_SID had to be changed, as both nodes have another ORACLE_SID for that database. With only one SID you would not have the problem.

Then, the failover resource was created and registered.

oracle@pts0138([crs]):/tmp> crs_profile -create oss.xdbprk.db-cold-failover \
-t application -r oss.xdbprk.rg -a
 /ora/product/11.1.0/crs/crs/public/act_db.pl 
-o "ci=20,ra=5,osrv=xdbprk,ol=/ora/product/10.2.0/db_2,oflags=1,rt=600"
oracle@pts0138([crs]):/tmp> crs_register oss.xdbprk.db-cold-failover

The value osrv=xdbprk will never work as this is not the correct instance name for any of the nodes. Even if I made the value osrv=xdbprk1, then the script would only work on one of the nodes i.e. the node that had the appropriate init.ora etc.

So, leaving the value osrv=xdbprk to this, I actually hard-coded the ORACLE_SID on both sides of the cluster in the act_db.pl. Since the value is now hard coded, it should work. This clearly limits the option to re-use the script for other database, so I'd better change the name of the script to act_db_xdbprk.pl if I do this for real.

So, how will the resource profiles look now ?

oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print oss.xdbprk.rg
NAME=oss.xdbprk.rg
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_resgroup.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=600
DESCRIPTION=oss.xdbprk.rg
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=
oracle@pts0101([crs]):/var/opt/oracle> crs_profile -print  oss.xdbprk.db-cold-failover
NAME=oss.xdbprk.db-cold-failover
TYPE=application
ACTION_SCRIPT=/ora/product/11.1.0/crs/crs/public/act_db.pl
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=20
DESCRIPTION=oss.xdbprk.db-cold-failover
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=oss.xdbprk.rg
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=1
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=/ora/product/10.2.0/db_2
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=xdbprk
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

Okay, and then the basic test of starting the resource, first all is down:

oracle@pts0101([crs]):/var/opt/oracle> crsstat
HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     OFFLINE    OFFLINE
oss.xdbprk.rg                                   OFFLINE    OFFLINE

Then, the start:

oracle@pts0101([crs]):/var/opt/oracle> crs_start  oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0138([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle   31568     1  0 11:51 ?        00:00:00 ora_smon_xdbprk2

And the relocate:

oracle@pts0101([crs]):/var/opt/oracle> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0138`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0138`
Stop of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0101`
Start of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

Let's verify if it runs on the other node:

oracle@pts0101([crs]):/var/opt/oracle> ps -ef | grep smon | grep dbprk
oracle   11568     1  0 11:55 ?        00:00:00 ora_smon_xdbprk1

And stopped on the original:

oracle@pts0138(xdbprk2):/ora/product/10.2.0/db_2/dbs> ps -ef | grep smon | grep dbprk

Okay, what is left to do from here:

Service names that used to be managed by CRS, now have to be coded hard in the spfile so that they register each time with the listener:

service_names in spfile added:

dbprk.xe.grid
dbprk.xe.supp
dbprk.xe.link

Also, for each node, another local listener will be used. I made sure this is in the spfile:

xdbprk1.local_listener='pts0101-LOCAL-LISTENER'
xdbprk2.local_listener='pts0138-LOCAL-LISTENER'

In order to be sure there can only be started one instance the cluster_database_instances_parameter is set to 1.

*.cluster_database_instances=1

For tnsnames a normail failover entry is created, if the first instance is down, the next will be found (and should be running):

oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> tnsping dbprk.xe.grid
TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 18-SEP-2009 12:39:14
Copyright (c) 1997,  2007, Oracle.  All rights reserved.
Used parameter files:
/etc/oss/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=pts0101-grid.nl.eu.abnamro.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=pts0138-grid.nl.eu.abnamro.com)(PORT=1521))(LOAD_BALANCE=ON))(CONNECT_DATA=(SERVICE_NAME=dbprk.xe.grid)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=20))))
OK (10 msec)

So, another basic test, what is the situation:

HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     ONLINE     ONLINE on pts0101
oss.xdbprk.rg                                   ONLINE     ONLINE on pts0101
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_start  oss.xdbprk.db-cold-failover
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0101`
Start of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.

In which instance will my session end ?

oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:39:55 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk1

And after the relocate, the session should go to the other instance:

oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crs_relocate -f oss.xdbprk.db-cold-failover
Attempting to stop `oss.xdbprk.db-cold-failover` on member `pts0101`
Stop of `oss.xdbprk.db-cold-failover` on member `pts0101` succeeded.
Attempting to stop `oss.xdbprk.rg` on member `pts0101`
Stop of `oss.xdbprk.rg` on member `pts0101` succeeded.
Attempting to start `oss.xdbprk.rg` on member `pts0138`
Start of `oss.xdbprk.rg` on member `pts0138` succeeded.
Attempting to start `oss.xdbprk.db-cold-failover` on member `pts0138`
Start of `oss.xdbprk.db-cold-failover` on member `pts0138` succeeded.
oracle@pts0101([crs]):/ora/dbprk/admin/xdbprk1/bdump> crsstat
HA Resource                                   Target     State
-----------                                   ------     -----
oss.xdbprk.db-cold-failover                     ONLINE     ONLINE on pts0138
oss.xdbprk.rg                                   ONLINE     ONLINE on pts0138
oracle@pts0101(xdbprk1):/ora/dbprk/admin/xdbprk1/bdump> sqlplus rk/rk@dbprk.xe.grid
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:41:22 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
xdbprk2

As an extra test, to make sure the two instances cannot be started concurrently, started with a test to start the second instance after starting the first.
As you can see this is not possible.

oracle@pts0101(*):/var/opt/oracle> db xdbprk1
ORACLE_SID=xdbprk1
ORACLE_HOME=/ora/product/10.2.0/db_2
oracle@pts0101(xdbprk1):/var/opt/oracle> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 12:42:02 2009
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to an idle instance.
SQL> startup;
ORACLE instance started.
Total System Global Area  536870912 bytes
Fixed Size                  2085360 bytes
Variable Size             331353616 bytes
Database Buffers          197132288 bytes
Redo Buffers                6299648 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

Rene Kundersma
Oracle Technology Services, The Netherlands

September 19, 2009

11gR2 Grid Infrastructure Installation

There is so much to tell about the new features that come with 11gR2, this new release gives me input for years ! Since the "Oracle Database New Features Guide 11g Release 2" does a good job here, I am not even trying to cover some of that.

I will however try to discuss some highlights or cool new things that changed since the previous (11gR1) release. 11gR2 Grid Infrastructure is one of those things.

11gR2 Grid Infrastructure combines Clusterware and ASM in one Oracle home and can be described as the next step in Grid Computing. If you are familiar with previous Clusterware and ASM releases, you will recognize the new functionality and way of working and realize this is indeed the next step in what we need for enabling Enterprise Grid. Deployment is simpler, faster and we are not talking about nodes anymore, but about services that live on resources.

One of the new features of 11gR2 is Grid Plug and Play, also called GPnP. Let me repeat what the documentation says about GPnP:

"Grid Plug and Play (GPnP) eliminates per-node configuration data and the need for explicit add and delete nodes steps. This allows a system administrator to take a template system image and run it on a new node with no further configuration. This removes many manual operations, reduces the opportunity for errors, and encourages configurations that can be changed easily. Removal of the per-node configuration makes the nodes easier to replace, because they do not need to contain individually-managed state.

Grid Plug and Play reduces the cost of installing, configuring, and managing database nodes by making their per-node state disposable. It allows nodes to be easily replaced with regenerated state"

Some of the key enablers for GPnP are GNS and DHCP. GNS, the Grid Naming Service is described here.

Since all of the requirements for a Grid Infrastructure installation are clearly documented in the "Grid Infrastructure installation guide", there is no need to discuss this.

This posting however is made to demo how to do an "Advanced Installation" of the Grid Infrastructure your self and show how to do an installation for education purposes, for example a situation at home where you want to test the setup of Oracle Grid Infrastructure with your own DNS and DHCP server. In real life, at customer sites, DNS and DHCP servers are all in place and Oracle Grid Infrastructure can leverage from these existing services.

Since most steps of the Oracle Grid Infrastructure installation are easy I will only only focus on the details I want to discuss regarding GNS and DHCP.

Oracle Grid Infrastructure can be downloaded here and when you made sure all prerequisites are checked you can start the installation by executing runInstaller.

It does makes sense to install the Oracle Grid Infrastructure with a different user id then the Oracle database. For this the Oracle documentation again has some sound examples. Because of this I had to make sure permissions for directories and for example ASM disks are setup with 'grid' permissions instead of 'oracle' (and both oinstall as group)

I used user "grid" to install the Oracle Grid Infrastructure and since I wanted to install and configure the Oracle Grid Infrastructure I chose the first option.

inst-00154.jpg

A typical installation does not have GNS and since the purpose of the posting is to explain about the setup with GNS, the "Advanced Installation" option was chosen.

inst-00155.jpg

Language, speaks for itself.

inst-00156.jpg

Okay, this basically is the most important step of the setup. At this step you have to define the name for your cluster. In my case "cluster01", that is an easy one as there are no relations for this.

The SCAN name however, is the "Simple Client Access Name" and will be setup by the Oracle Grid Infrastructure. This SCAN name will resolve to three ip addresses within the cluster. The good news is that you don't have to do much for it, just make up a name that your clients will use later to acces databases in your cluster. SCAN Port 1521 is the default port we always use for SQLNet. The SCAN name has to be in the GNS Sub Domain as explained below:

inst-00157.jpg

The option "Configure GNS" was checked. If this box was not checked, still SCAN could be used, but then, I had to setup the SCAN entries in DNS myself, with the SCAN name resolving to three different ip addresses.

However, since GNS is checked, the Grid Naming Service will be configured and GNS will setup my SCAN name. The only requirement is that a GNS Sub domain must be made and the DNS must be configured so that each request for this Sub Domain will be delegated to the GNS Sub Domain, so that GNS can handle the request.

The GNS VIP address is the ip address of the server that will host the GNS. You need to make sure this one is available for use.

You may ask yourself why this all is needed. Well, imagine yourself a cluster, where nodes are added and removed dynamically. In this situation, the complete administration with ip address management and name resolution management is done by the cluster itself. No need to do any manual work in updating connection strings, configuring ip numbers etc.

inst-00158.jpg

So how does it work:

First, my (named, linux) DNS is running on 10.161.102.40.
This DNS does the naming for cluster01.nl.oracle.com and pts.local.
For cluster01.nl.oracle.com a "delegation" is made, so that every request to a machine in the domain .cluster01.nl.oracle.com is delegated to the GNS. (with the GNS VIP).

In DNS:

cluster01.nl.oracle.com NS gns.cluster01.nl.oracle.com
gns.cluster01.nl.oracle.com. 10.161.102.55

So, once the cluster installation is done, the GNS in the cluster will be stared and a request to scan.cluster01.nl.oracle.com will be forwarded to the GNS. The GNS will then take care of the request and answer which three nodes in the cluster will serve as scan listeners:

[root@gridnode01pts05 ~]# nslookup scan.cluster01.nl.oracle.com
Server:         10.161.102.40
Address:        10.161.102.40#53
Non-authoritative answer:
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.78
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.79
Name:   scan.cluster01.nl.oracle.com
Address: 10.161.102.77

Also, with dig, you can see all information coming from GNS:

[root@dns-dhcp ~]# dig scan.cluster01.nl.oracle.com
; <<>> DiG 9.3.4-P1 <<>> scan.cluster01.nl.oracle.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46016
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 10, ADDITIONAL: 10
;; QUESTION SECTION:
;scan.cluster01.nl.oracle.com.  IN      A
;; ANSWER SECTION:
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.78
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.79
scan.cluster01.nl.oracle.com. 6 IN      A       10.161.102.77
;; AUTHORITY SECTION:
oracle.com.             10732   IN      NS      dns2.us.oracle.com.
oracle.com.             10732   IN      NS      dns3.us.oracle.com.
oracle.com.             10732   IN      NS      dns4.us.oracle.com.
oracle.com.             10732   IN      NS      dns1-us.us.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster1.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster2.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster3.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster4.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster5.oracle.com.
oracle.com.             10732   IN      NS      dnsmaster6.oracle.com.
;; ADDITIONAL SECTION:
dns2.us.oracle.com.     3984    IN      A       130.35.249.52
dns3.us.oracle.com.     3984    IN      A       144.20.190.70
dns4.us.oracle.com.     3984    IN      A       138.2.202.15
dns1-us.us.oracle.com.  3984    IN      A       130.35.249.41
dnsmaster1.oracle.com.  1060    IN      A       192.135.82.4
dnsmaster2.oracle.com.  1060    IN      A       192.135.82.20
dnsmaster3.oracle.com.  1060    IN      A       192.135.82.36
dnsmaster4.oracle.com.  1060    IN      A       192.135.82.52
dnsmaster5.oracle.com.  1060    IN      A       192.135.82.70
dnsmaster6.oracle.com.  1060    IN      A       192.135.82.84
;; Query time: 0 msec
;; SERVER: 10.161.102.40#53(10.161.102.40)
;; WHEN: Sat Sep 19 17:15:47 2009
;; MSG SIZE  rcvd: 486


Later, when the database is installed, you can use the SCAN with SQLNet EZ connect to connect to the database. Can't wait, I just have to demo it now:

[oracle@gridnode01pts05 ~]$ sqlplus system/oracle@scan.cluster01.nl.oracle.com:1521/dbpts05.pts.local
SQL*Plus: Release 11.2.0.1.0 Production on Sat Sep 19 17:11:32 2009
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> 

After specification of the GNS details the details of the nodes within the cluster have to be entered. The node names you enter here also have to be able to resolve their name. The management of the virtual ip address will be done automatically as long as a working DHCP service is available to serve ip addresses within that network. You can see here that the nodes can be in another domain then the GNS Sub Domain.

inst-00159.jpg

Click here for my dhcp config.

New in 11gR2 is the ability to let the installer configure ssh for you ! Great !

inst-00160.jpg

Even if it says that it will take several minutes, most of the time the ssh setup is done within the minute.

inst-00161.jpg

As in 10g, and 11gR1, this step is used to specify the public and internal interface.
Same as in 10g and 11gR1, the private interface will be used for the interconnect.

inst-00162.jpg

11gR2 has the new option to place the OCR and Voting disks on ASM storage, so that is what I will do.

inst-00163.jpg

After choosing for ASM, the next step is specifying disks that will be used for the ASM diskgroup. This is also kind-of 10g/11gR1, however, at that time this step was in the DBCA.

inst-00164.jpg

Choosing 6 disks of 2GB with external redundancy.

inst-00165.jpg

You see the installer complaining about my not so complicated password that I chose. (since this not for production purposes)

inst-00166.jpg

inst-00167.jpg

Intelligent Planform Management is really cool, look at this. You choose to let the Grid Infrastructure work with it.

inst-00168.jpg

In the real world it does makes sense to separate the three groups !

inst-00169.jpg

inst-00170.jpg

Screen to specify the Oracle base and Software Location for the Grid Infrastructure. Remember, this location has to extist (read and write) on all nodes that you plan to install the software on.

inst-00171.jpg

Location of the inventory

inst-00172.jpg

This is really great. The Prerequisite checker now has the ability to generate a "fix" script.
So some (not all) requirements that are not setup okay can be corrected by running a fix-up script generated by the prerequisite checker.

inst-00173.jpg

In my situation, some kernel parameters needed to be changed. That could be done easily with this tool.

inst-00174.jpg

Run the script on both nodes.

inst-00175.jpg

My swap space seems to be 1KB too small. This is because I used an Oracle VM template (yes, although not officially certified yet I am running virtualized). I think I will manage with 1KB less then recommend.

inst-00176.jpg

Summary:

inst-00177.jpg

Progress:

inst-00179.jpg

Software being transferred to the other node:

inst-00180.jpg

And running the root scripts to setup permissions and configure the cluster.

inst-00181.jpg

Running oraInstRoot.sh on both nodes:

inst-00182.jpg

Running root.sh on node 1, this is where the cluster configuration is done.

inst-00183.jpg

Watch the Voting File on the first ASM disk in diskgroup DATA:

inst-00184.jpg

And root.sh on node 1 finished...

inst-00185.jpg

Don't forget node 2:

inst-00186.jpg

inst-00187.jpg

After running the root scripts, you have to click on okay. After this some small post configuration steps are performed by the installer and clufvy also runs. If this all runs fine, like it did here the screen will quickly continue to the last page telling you the installation was succesful

inst-00188.jpg

In my next postings i will show the installation and creation of a 11gR2 RAC database, and also adding nodes to the cluster and adding instances to the RAC database.

Rene Kundersma
Oracle Technology Services, The Netherlands

About GRID

This page contains an archive of all entries posted to Oracle XPS The Netherlands On HA in the GRID category. They are listed from oldest to newest.

Enterprise Manager is the previous category.

Linux is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle