Provisioning your GRID with Oracle VM Templates

Introduction (Chapter 1)

Linux node installation and configuration (virtualized or not) for an Oracle Grid environment can be done on various ways. Of course, one could do this all manually, but for the larger environments this would of course be undo able.

Also, you want to make sure each installation has the same specifications, and you want to be sure human errors that may occur during the installation are brought back to a minimum.

This blog entry will have chapters in which all details of an automated Oracle VM cloning process will be described.

The setup as described below is used to prepare education environments. It will also work for proof of concept envrionments and most parts of it may be even usable in your own Grid deployment strategy.

The setup described allows you to setup an GRID environments that students can use to learn (for instance) how to install RAC, configure DataGuard, work with Enterprise Manager Grid Control. I can also be used to learn students how to work with Swingbench or FCF all within their own infrastructure.

This virtualized solution help to quickly setup, repair, catch-up, restore and adapt the setup. It will save your IT department costs on hardware and storage and it will save you lots of time.

The pictures on this page are best viewed with Firefox.

Bare metal provisioning

Within the Oracle Grid, Oracle Enterprise Manager Grid Control release 10.2.0.4 with kickstart and PXE-boot is used more often these days as a way to do a so called "bare metal" installation of the OS: kix.gif

After this bare metal installation "post configuration scripts" took care of the node specific settings.

Even with the use of Oracle Virtual Machines on top of such a node, the kickstart procedure can still be used; without too much effort a PXE-boot configuration for virtualized guests can be setup.

This way of "bare metal installation" or better "virtual metal installation" by PXE-boot for VM Guests is a nice solution, which I will describe one day. But why would one do a complete installation for each VM while each VM differs only on a couple of configuration files ?

This blog entry explains how to use an Oracle VM template to provision Virtual Guest Operating Systems for in a Grid situation.

For educational purposes, where classes with a lot of students have to work each with their own Grid environment, a procedure is worked out to provision a blade system with operating systems and software, Grid ready, all based on Oracle templates.

As said, more options are possible, this is how my solution works, it may work for you also. 1. An example OS configuration is provided (node specific configuration files). From that template files a VM Guest specific configuration is generated automatically. This configuration describes settings for hostname, ipnumbers, etc.
2. A vm template (image) is provided.

By automating the two steps above, one can easily and quickly setup Virtualized Oracle Linux Nodes, ready for RAC.

The next chapter will be about the configuration templates and the cloning process

The process (Chapter 2)

With this configuration templates as described earlier, "configuration clones" can be made. In this example I am using HP blade technology. On each blade six VMs will be running. For each blade and for each VM running on top of that the configuration files are generated.

It makes sense to define configuration templates. With the use of scripts you could use these templates and generate configuration files for each specific vm.

With a VM template in one hand, and an automatically generated set of configuration files in the other you can quickly build, or rebuild the infrastructure over and over again.

Even if you need to make changes that reflect all vm's, they can be rolled out quite quickly.

As said, this solution is extremely useful for education purposes, or situations where you have to provide lots of VM guests ready to be used instantly. Possible other uses are in proof of concept environments.

In short the work flow of the cloning process looks like the following: 1. A default virtual machine image is copied over
2. Configuration files for the VM are generated, based upon the blade number and vm number and purpose of the VM
3. The VM image is "mounted" and configuration files are overwritten with the generated configuration files. Also binaries (other programs) are put in place
4. The VM image is unmounted and if needed "file based shared storage" is created.
5. The VM boots for the first time, ready to use immediately, totally pre-configured


The concept itself can of course also be used for the Linux provisioning of your virtualized infrastructure as an alternative to bare metal provisioning.

The next chapter will describe the hardware used and the chosen storage solutions for this example.

Hardware used (Chapter 3)

As discussed in the previous chapter, this project is build on HP blade technology.

The solution described is of course independent of the hardware chosen.

However, in order to describe the complete setup this chapter is here to describe the hardware used.

blade01.JPG

This blade enclosure (C3000) has eight blades, each blade has:
- two nics (broadcom)
- two hba's (qlogic)
- 16 GB of RAM
- two quad core Intel Xeon processors


Storage to the blades is made available by NFS and Fiber Channel

The NFS share is used to provide the VM template that will be used as source.

The same NFS share is also available to the VM guests in order to provide the guests the option to install software from a shared location.

The SAN Storage comes from an HP MSA. This MSA devices are used for OCFS2. This is where the VM images files will be placed

Each blade is available by a public network interface.

Also a private network is setup as interconnect network for OCFS2 between the blades.

For each blade the architecture be equal to the diagram below.

blade02.jpg

VM distribution (Chapter 4)

As said in an earlier chapter, each blade has 16GB RAM, so this is enough to run at least 6 VMs of 2GB RAM each.

The purpose is to have:
- 3 vms for Real Application Clusters (RAC) (11.1.0.7 CRS/ASM/RDBMS)
- 1 vm for Dataguard (11.1.0.7 ASM/RDBMS)
- 1 vm to run swingbench and demo applications
- 1 vm to run Enterprise Manager grid Control (EMGC).


This will look this way:
blade03.jpg

As each blade has 146 GB local storage, there is room to have some VM's on local disks. Since, there is no intention to live migrate these nodes they can be put on a non-shared location.

VM number six (EMGC) is too big to fit next to the other VMs on local storage. For reason a shared OCFS mount is made.

Each VM uses the Oracle VM provided location for the VMs (/OVS/running_pool) With symbolic links the storage for the EMGC vm is brought to the OCFS2 shared disk: GRIDNODE09 -> /OVS_shared_large/running_pool/oemgc/nlhpblade07

By default OCFS2 allows four nodes to concurrently mount OCFS2 filesystem. In order to mount the OCFS2 filesystem on all blades concurrently you have to specify the –N X argument with the execution of mkfs where X is the max. number of nodes that will concurrently mount the OCFS filesystem ever.

mkfs.ocfs2 -b 4K -C 32K -N 8 -L ovmsdisk /dev/sdb1


PV Templates (Chapter 5)

Before doing any specific VM changes, first a template is chosen, in this case Oracle Enterprise Linux 5 update 2 (OEL5U2).

This is an Oracle VM template downloaded from OTN.

Our template is a para-virtualized template, based on a 32bit architecture.

To remind you, this is how the para-virtualized architecture looks: blade04.jpg

b.t.w. para-virtualized kernels often work faster then hardware virtualized guests.

Please see this link for more information on hardware v.s. para-virtualized guests

As part of the procedure described, the template will be copied over six times to each blade. In order to use the VMs on a specific blade for a specific purpose configuration files must be made. The next chapter describes how this works.

VM Specific files and clone procedure (Chapter 6)

Each virtualized guest has a small set of configuration files that are specific for that OS. Typically these files exists outside of the guest (vm.cfg) and inside the guest.

Specific files inside the vm:
- /etc/sysconfig/network-scripts/ifcfg-eth*
- /etc/sysconfig/network
- ssh configuration files

Specific files outside the vm:
- vm.cfg

For VMs running on the same blade (and being part of the same 'grid') there are also files in common:
- nsswitch.conf
- resolv.conf
- sudoers
- sysctl.conf
- hosts

The files mentioned above need to be changed. This is because of the fact each machine needs it's own NIC's with specific MAC Addresses and it's own ip-numbers.

Of course, within a grid (on a blade) each VM has to have a unique name.

In order to make sure unique MAC addresses will be generated, one has to setup standards.

For the MAC addresses, the following formula is used: 00:16:3E:XD:0Y:0Z, where:
X: the number of the blade
Y: the number of the VM,
Z: the number of the NIC within that VM.

Host names will be used multiple times (but not within the same grid), the only thing that needs to change are the corresponding ip-numbers, these must be unique across the grids.
For example, the MAC address for the second NIC on the third VM on blade 7 would look like: HWADDR=00:16:3E:7D:03:02
The same strategy is used to determine the ip-numbers to be used:
- For the public network 192.168.200.1XY is used.
- For the internal network 10.0.0.1XY is used
- For the vip 192.168.200.XY is used.

Where:
X: the number of the Blade
Y: the number of the VM

For example:
- the public ip-number of node 3 on blade7 would be: 192.168.200.173
- the private ip-number of node 3 on blade7 would be: 10.0.0.173
- the virtual ip-number of node 3 on blade7 would be:192.168.200.73

So, from here, as long as you know for which blade and for which VM you will be generating the configuration, you can script that:
[root@nlhpblade07 tools]# ./clone_conf.sh nlhpblade01
Copying config files from /OVS_shared_large/conf/nlhpblade07 to /OVS_shared_large/conf/nlhpblade01...
Performing config changes specific to the blade and the VM...
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth0
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth1
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/network
# nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/vm.cfg
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth0
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth1
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/network
# nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/vm.cfg
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth0
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth1
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/network
# nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/vm.cfg
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth0
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth1
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/network
# nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/vm.cfg
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth0
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth1
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/network
# nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/vm.cfg
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/ifcfg-eth0
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/network
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/vm.cfg
Performing node common changes for the configuration files...
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/cluster.conf
# nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/hosts
[root@nlhpblade07 tools]# 
'mounting a vm' (Chapter 7)

Now that we generated the node specific configuration files and copied the basic template we are ready to modify the OS before even booting it. What will happen after 'mounting' the VM image file is that the generated configuration will be copied over into the VM.

As said, at this moment the VM is an image file, for example /OVS/running_pool/GRIDNODE01/system.img. XEN will setup a loop in order to boot the OS from that image.

We do kind of the same in order to change the OS before we boot it:

First, the losetup command is used to associate a loop device with the file. A loop device, is a pseudo-device that makes a file accessible as a block device.
[root@nlhpblade07 GRIDNODE03]#  losetup /dev/loop9 system.img
Now we have mapped the image file to a block device, we want to see the partitions on that. For this we use the command kpartx. Kpartx creates device maps from partitioned tables. Kpart is part of device-mapper multipath
[root@nlhpblade07 GRIDNODE03]# kpartx -a /dev/loop9
So, lets see what partitions device-mapper has for us:
[root@nlhpblade07 GRIDNODE03]# ls /dev/mapper/loop9*
/dev/mapper/loop9p1  /dev/mapper/loop9p2  /dev/mapper/loop9p3
kpartx found three partitions and told DM there are three partitions available. Let's see if we can identify the types:
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p1
/dev/mapper/loop9p1: Linux rev 1.0 ext3 filesystem data
This is probably the /boot partition of the vm.
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p2
/dev/mapper/loop9p2: LVM2 (Linux Logical Volume Manager) , UUID: t2SAm03KoxfUcCOS3OYmsXf9ubqcy9q
This maybe the root or the swap partition
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p3
/dev/mapper/loop9p3: LVM2 (Linux Logical Volume Manager) , UUID: j2U7KUWen1ePjDvm4hTclZvA5YJyvl9
[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2
This may also be the root or the swap partition

So, in order to make a better guess in finding the root partition, let's see what the sizes are:
[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2

Disk /dev/mapper/loop9p2: 13.8 GB, 13851371520 bytes
255 heads, 63 sectors/track, 1684 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/mapper/loop9p2 doesn't contain a valid partition table

[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p3

Disk /dev/mapper/loop9p3: 5362 MB, 5362882560 bytes
255 heads, 63 sectors/track, 652 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/mapper/loop9p3 doesn't contain a valid partition table
As we can see, one partition is 5GB and the other is 13GB. Best guess would be, the 5GB partion is the swap and the 13GB partition the OS.

With the command vgscan we can scan the newly 'discovered' 'disks' and search for volume groups on them:
[root@nlhpblade07 GRIDNODE03]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
vgdisplay says we have one volume group (VolGroup00):
[root@nlhpblade07 GRIDNODE03]# vgdisplay
  --- Volume group ---
  VG Name               VolGroup00
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               17.84 GB
  PE Size               32.00 MB
  Total PE              571
  Alloc PE / Size       571 / 17.84 GB
  Free  PE / Size       0 / 0   
  VG UUID               kmhYBm-Mpbv-usx2-vDur-rEVb-uP4i-kcP4fc
With the command, vgchange -a we can make logical volumes available to use for the kernel.
[root@nlhpblade07 GRIDNODE03]# vgchange -a y VolGroup00
  2 logical volume(s) in volume group "VolGroup00" now active
[root@nlhpblade07 GRIDNODE03]# lvdisplay
lvdisplay can be use to see to see the attributes of a logical volume:
[root@nlhpblade07 GRIDNODE03]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol00
  VG Name                VolGroup00
  LV UUID                B13hk3-f5qY-3gDY-Ackt-13gK-DZDc-cTWx3V
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                14.72 GB
  Current LE             471
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:3
   
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol01
  VG Name                VolGroup00
  LV UUID                iEO4oG-XPMU-syWF-qupo-811i-G6Gg-QZEw5f
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                3.12 GB
  Current LE             100
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:4
So, now we found, (and made available to the logical volume) the root filesystem where the VM is on. Now we can mount that:
[root@nlhpblade07 GRIDNODE03]# mkdir guest_local_LogVol00; 
[root@nlhpblade07 GRIDNODE03]# mount /dev/VolGroup00/LogVol00 guest_local_LogVol00 
See the contents of the filesystem:
[root@nlhpblade07 GRIDNODE03]# cd guest_local_LogVol00/

[root@nlhpblade07 guest_local_LogVol00]# ls -la
total 224
drwxr-xr-x 26 root root  4096 Jan 14  2009 .
drwxr-xr-x  3 root root  4096 Oct 22 22:30 ..
-rw-r--r--  1 root root     0 Jul 24 05:02 .autorelabel
drwxr-xr-x  2 root root  4096 Dec 20  2008 bin
drwxr-xr-x  2 root root  4096 Jun  6 11:26 boot
drwxr-xr-x  4 root root  4096 Jun  6 11:26 dev
drwxr-xr-x 94 root root 12288 Jan 14  2009 etc
drwxr-xr-x  3 root root  4096 Jun  6 11:50 home
drwxr-xr-x 14 root root  4096 Dec 20  2008 lib
drwx------  2 root root 16384 Jun  6 11:26 lost+found
drwxr-xr-x  2 root root  4096 Apr 21  2008 media
drwxr-xr-x  2 root root  4096 May 22 09:51 misc
drwxr-xr-x  3 root root  4096 Dec 20  2008 mnt
dr-xr-xr-x  2 root root  4096 Jun 10 11:11 net
drwxr-xr-x  3 root root  4096 Aug 21 04:11 opt
-rw-r--r--  1 root root     0 Jan 14  2009 poweroff
drwxr-xr-x  2 root root  4096 Jun  6 11:26 proc
drwxr-x--- 17 root root  4096 Jan 13  2009 root
drwxr-xr-x  2 root root 12288 Dec 20  2008 sbin
drwxr-xr-x  4  500  500  4096 Jan 14  2009 scratch
drwxr-xr-x  2 root root  4096 Jun  6 11:26 selinux
drwxr-xr-x  2 root root  4096 Apr 21  2008 srv
drwxr-xr-x  2 root root  4096 Jun  6 11:26 sys
drwxr-xr-x  3 root root  4096 Jun  6 11:33 tftpboot
drwxrwxrwt  9 root root  4096 Jan 14  2009 tmp
drwxr-xr-x  3 root root  4096 Dec 20  2008 u01
drwxr-xr-x 14 root root  4096 Jun  6 11:31 usr
drwxr-xr-x 21 root root  4096 Jun  6 11:37 var
As this seems a rather easy way to mount a vm image file, it is still not something you will do very quickly for 40 VM images very quickly.

For this reason, the described solution is scripted and called mount_vm.sh. This is how it works:
[root@nlhpblade07 GRIDNODE05]# mount_vm.sh GRIDNODE05
Starting mount...

contents /etc/sysconfig/network -file of mounted node:
NETWORKING=yes

NETWORKING_IPV6=no
HOSTNAME=gridnode05.nl.oracle.com

Generating unmount script
To unmount your image run  /tmp/umount_GRIDNODE05.30992.sh as root

Mounting finished...
As you can see the images is mounted my a script and a script to unmount is automatically generated. In order to verify the right image file is mounted the contents of the file /etc/sysconfig/network is shown.

'changing a vm' (Chapter 8)

Now the vm image is mounted to the filesystem, we can go back to the generated config files. From here it is easy to copy over all specific configuration files to the vm. Better, would be to make a script available to do this, and that is done for this solution:
[root@nlhpblade07 GRIDNODE05]# change_vm.sh GRIDNODE05
If you are sure, hit Y or y to continue
Y
Continuing...
Starting config change for VM GRIDNODE05 on nlhpblade07...
Copying swingbench...
Changing ownership of swingbench files...
Copying FCF-Java Demo...
Changing ownership of  FCF-Java Demo files...
This vm requires pre-build file /OVS/sharedDisk/rdbms_home_11r1_01_ocfs.img as shared Oracle RDBMS HOME
Finished changing config...
Now the VM is modified internally and still mounted, the unmount has to be done. This can be done by running the generated unmount script. This script was generated during the mount.
[root@nlhpblade07 GRIDNODE05]# /tmp/umount_GRIDNODE05.30992.sh

Unmount finished
If the unmount succeeded, you can remove this file

rm: remove regular file `/tmp/umount_GRIDNODE05.30992.sh'? y


All Together (Chapter 9)

In essence the procedure described above should be repeated for each VM you want to clone and change on each blade. This may already save you hours of work and reduces chances on mistakes, but still may seem a lot of steps. In order to repeat this for each blade, for each vm, from here it is just a matter of scripting.

So, you could make a script, that for each blade, would do the following:

Pseudo:
for each blade in blade list
do
 Stop all vms first
     while vms still running
     do 
        wait 10 seconds
     done
 Restore all machines (from NFS)
 Clone conf
 Mount and change all vms
 Start all
done 
For all 42 VM images the implemented version of the script above runs about four hours. After this a complete 42 node education environment is setup.
NLHPBLADE%20TA3-1.jpg
Extra Options (Chapter 10)

Besides changing only configuration settings on a VM as mentioned in the Change a VM chapter, other activities can also be done.

For this solution the following options are also implemented.
- configure vnc
- configure ocfs2 within the guest, use a shared Oracle home to save space
- copy and configure software (like an oracle home or swingbench)
- create ASM disk files
- create OCR and Voting disks
- configure sudoers
- provide software for EM Agent Deployment
- configure ssh

Rene Kundersma Oracle Expert Services, The Netherlands
Comments:

Hi Rene, This is a great blog entry, thanks for your practice sharing. I found 2 typos in chapter 6 a. |For the MAC addresses, the following formula is used: 0:16:3E:XD:0Y:0Z, where:| i guess the macaddr should begin with '00'. b. |- the private ip-number of node 3 on blade7 would be: 10.0.0.171|, the ipaddr should be 10.0.0.173

Posted by Shawn Zeng on March 12, 2009 at 11:05 AM PDT #

Shawn, Thanks for your compliment. I will fix the two typo soon ! Rene

Posted by Rene on March 12, 2009 at 10:51 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Blog of Rene Kundersma, Principal Member of Technical Staff at Oracle Development USA. I am designing and evaluating solutions and best practices around database MAA focused on Exadata. This involves HA, backup/recovery, migration and database consolidation and upgrades on Exadata. Opinions are my own and not necessarily those of Oracle Corporation. See http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today