Provisioning your GRID with Oracle VM Templates
By Rene Kundersma-Oracle on Mar 10, 2009
Linux node installation and configuration (virtualized or not) for an Oracle Grid environment can be done on various ways. Of course, one could do this all manually, but for the larger environments this would of course be undo able.
Also, you want to make sure each installation has the same specifications, and you want to be sure human errors that may occur during the installation are brought back to a minimum.
This blog entry will have chapters in which all details of an automated Oracle VM cloning process will be described.
The setup as described below is used to prepare education environments. It will also work for proof of concept envrionments and most parts of it may be even usable in your own Grid deployment strategy.
The setup described allows you to setup an GRID environments that students can use to learn (for instance) how to install RAC, configure DataGuard, work with Enterprise Manager Grid Control. I can also be used to learn students how to work with Swingbench or FCF all within their own infrastructure.
This virtualized solution help to quickly setup, repair, catch-up, restore and adapt the setup. It will save your IT department costs on hardware and storage and it will save you lots of time.
The pictures on this page are best viewed with Firefox.
Bare metal provisioning
Within the Oracle Grid, Oracle Enterprise Manager Grid Control release 10.2.0.4 with kickstart and PXE-boot is used more often these days as a way to do a so called "bare metal" installation of the OS:
After this bare metal installation "post configuration scripts" took care of the node specific settings.
Even with the use of Oracle Virtual Machines on top of such a node, the kickstart procedure can still be used; without too much effort a PXE-boot configuration for virtualized guests can be setup.
This way of "bare metal installation" or better "virtual metal installation" by PXE-boot for VM Guests is a nice solution, which I will describe one day. But why would one do a complete installation for each VM while each VM differs only on a couple of configuration files ?
This blog entry explains how to use an Oracle VM template to provision Virtual Guest Operating Systems for in a Grid situation.
For educational purposes, where classes with a lot of students have to work each with their own Grid environment, a procedure is worked out to provision a blade system with operating systems and software, Grid ready, all based on Oracle templates.
As said, more options are possible, this is how my solution works, it may work for you also. 1. An example OS configuration is provided (node specific configuration files). From that template files a VM Guest specific configuration is generated automatically. This configuration describes settings for hostname, ipnumbers, etc.
2. A vm template (image) is provided.
By automating the two steps above, one can easily and quickly setup Virtualized Oracle Linux Nodes, ready for RAC.
The next chapter will be about the configuration templates and the cloning process
The process (Chapter 2)
With this configuration templates as described earlier, "configuration clones" can be made. In this example I am using HP blade technology. On each blade six VMs will be running. For each blade and for each VM running on top of that the configuration files are generated.
It makes sense to define configuration templates. With the use of scripts you could use these templates and generate configuration files for each specific vm.
With a VM template in one hand, and an automatically generated set of configuration files in the other you can quickly build, or rebuild the infrastructure over and over again.
Even if you need to make changes that reflect all vm's, they can be rolled out quite quickly.
As said, this solution is extremely useful for education purposes, or situations where you have to provide lots of VM guests ready to be used instantly. Possible other uses are in proof of concept environments.
In short the work flow of the cloning process looks like the following: 1. A default virtual machine image is copied over
2. Configuration files for the VM are generated, based upon the blade number and vm number and purpose of the VM
3. The VM image is "mounted" and configuration files are overwritten with the generated configuration files. Also binaries (other programs) are put in place
4. The VM image is unmounted and if needed "file based shared storage" is created.
5. The VM boots for the first time, ready to use immediately, totally pre-configured
The concept itself can of course also be used for the Linux provisioning of your virtualized infrastructure as an alternative to bare metal provisioning.
The next chapter will describe the hardware used and the chosen storage solutions for this example.
Hardware used (Chapter 3)
As discussed in the previous chapter, this project is build on HP blade technology.
The solution described is of course independent of the hardware chosen.
However, in order to describe the complete setup this chapter is here to describe the hardware used.
This blade enclosure (C3000) has eight blades, each blade has:
- two nics (broadcom)
- two hba's (qlogic)
- 16 GB of RAM
- two quad core Intel Xeon processors
Storage to the blades is made available by NFS and Fiber Channel
The NFS share is used to provide the VM template that will be used as source.
The same NFS share is also available to the VM guests in order to provide the guests the option to install software from a shared location.
The SAN Storage comes from an HP MSA. This MSA devices are used for OCFS2. This is where the VM images files will be placed
Each blade is available by a public network interface.
Also a private network is setup as interconnect network for OCFS2 between the blades.
For each blade the architecture be equal to the diagram below.
VM distribution (Chapter 4)
As said in an earlier chapter, each blade has 16GB RAM, so this is enough to run at least 6 VMs of 2GB RAM each.
The purpose is to have:
- 3 vms for Real Application Clusters (RAC) (18.104.22.168 CRS/ASM/RDBMS)
- 1 vm for Dataguard (22.214.171.124 ASM/RDBMS)
- 1 vm to run swingbench and demo applications
- 1 vm to run Enterprise Manager grid Control (EMGC).
This will look this way:
As each blade has 146 GB local storage, there is room to have some VM's on local disks. Since, there is no intention to live migrate these nodes they can be put on a non-shared location.
VM number six (EMGC) is too big to fit next to the other VMs on local storage. For reason a shared OCFS mount is made.
Each VM uses the Oracle VM provided location for the VMs (/OVS/running_pool) With symbolic links the storage for the EMGC vm is brought to the OCFS2 shared disk: GRIDNODE09 -> /OVS_shared_large/running_pool/oemgc/nlhpblade07
By default OCFS2 allows four nodes to concurrently mount OCFS2 filesystem. In order to mount the OCFS2 filesystem on all blades concurrently you have to specify the –N X argument with the execution of mkfs where X is the max. number of nodes that will concurrently mount the OCFS filesystem ever.
mkfs.ocfs2 -b 4K -C 32K -N 8 -L ovmsdisk /dev/sdb1
PV Templates (Chapter 5)
Before doing any specific VM changes, first a template is chosen, in this case Oracle Enterprise Linux 5 update 2 (OEL5U2).
This is an Oracle VM template downloaded from OTN.
Our template is a para-virtualized template, based on a 32bit architecture.
To remind you, this is how the para-virtualized architecture looks:
b.t.w. para-virtualized kernels often work faster then hardware virtualized guests.
Please see this link for more information on hardware v.s. para-virtualized guests
As part of the procedure described, the template will be copied over six times to each blade. In order to use the VMs on a specific blade for a specific purpose configuration files must be made. The next chapter describes how this works.
VM Specific files and clone procedure (Chapter 6)
Each virtualized guest has a small set of configuration files that are specific for that OS. Typically these files exists outside of the guest (vm.cfg) and inside the guest.
Specific files inside the vm:
- ssh configuration files
Specific files outside the vm:
For VMs running on the same blade (and being part of the same 'grid') there are also files in common:
The files mentioned above need to be changed. This is because of the fact each machine needs it's own NIC's with specific MAC Addresses and it's own ip-numbers.
Of course, within a grid (on a blade) each VM has to have a unique name.
In order to make sure unique MAC addresses will be generated, one has to setup standards.
For the MAC addresses, the following formula is used: 00:16:3E:XD:0Y:0Z, where:
X: the number of the blade
Y: the number of the VM,
Z: the number of the NIC within that VM.
Host names will be used multiple times (but not within the same grid), the only thing that needs to change are the corresponding ip-numbers, these must be unique across the grids.
For example, the MAC address for the second NIC on the third VM on blade 7 would look like: HWADDR=00:16:3E:7D:03:02
The same strategy is used to determine the ip-numbers to be used:
- For the public network 192.168.200.1XY is used.
- For the internal network 10.0.0.1XY is used
- For the vip 192.168.200.XY is used.
X: the number of the Blade
Y: the number of the VM
- the public ip-number of node 3 on blade7 would be: 192.168.200.173
- the private ip-number of node 3 on blade7 would be: 10.0.0.173
- the virtual ip-number of node 3 on blade7 would be:192.168.200.73
So, from here, as long as you know for which blade and for which VM you will be generating the configuration, you can script that:
[root@nlhpblade07 tools]# ./clone_conf.sh nlhpblade01 Copying config files from /OVS_shared_large/conf/nlhpblade07 to /OVS_shared_large/conf/nlhpblade01... Performing config changes specific to the blade and the VM... # nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth0 # nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/ifcfg-eth1 # nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/network # nlhpblade01 - GRIDNODE01 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE01/vm.cfg # nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth0 # nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/ifcfg-eth1 # nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/network # nlhpblade01 - GRIDNODE02 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE02/vm.cfg # nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth0 # nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/ifcfg-eth1 # nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/network # nlhpblade01 - GRIDNODE03 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE03/vm.cfg # nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth0 # nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/ifcfg-eth1 # nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/network # nlhpblade01 - GRIDNODE04 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE04/vm.cfg # nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth0 # nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/ifcfg-eth1 # nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/network # nlhpblade01 - GRIDNODE05 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE05/vm.cfg # nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/ifcfg-eth0 # nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/network # nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/GRIDNODE09/vm.cfg Performing node common changes for the configuration files... # nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/cluster.conf # nlhpblade01 - GRIDNODE09 - /OVS_shared_large/conf/nlhpblade01/common/hosts [root@nlhpblade07 tools]#'mounting a vm' (Chapter 7)
Now that we generated the node specific configuration files and copied the basic template we are ready to modify the OS before even booting it. What will happen after 'mounting' the VM image file is that the generated configuration will be copied over into the VM.
As said, at this moment the VM is an image file, for example /OVS/running_pool/GRIDNODE01/system.img. XEN will setup a loop in order to boot the OS from that image.
We do kind of the same in order to change the OS before we boot it:
First, the losetup command is used to associate a loop device with the file. A loop device, is a pseudo-device that makes a file accessible as a block device.
[root@nlhpblade07 GRIDNODE03]# losetup /dev/loop9 system.imgNow we have mapped the image file to a block device, we want to see the partitions on that. For this we use the command kpartx. Kpartx creates device maps from partitioned tables. Kpart is part of device-mapper multipath
[root@nlhpblade07 GRIDNODE03]# kpartx -a /dev/loop9So, lets see what partitions device-mapper has for us:
[root@nlhpblade07 GRIDNODE03]# ls /dev/mapper/loop9* /dev/mapper/loop9p1 /dev/mapper/loop9p2 /dev/mapper/loop9p3kpartx found three partitions and told DM there are three partitions available. Let's see if we can identify the types:
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p1 /dev/mapper/loop9p1: Linux rev 1.0 ext3 filesystem dataThis is probably the /boot partition of the vm.
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p2 /dev/mapper/loop9p2: LVM2 (Linux Logical Volume Manager) , UUID: t2SAm03KoxfUcCOS3OYmsXf9ubqcy9qThis maybe the root or the swap partition
[root@nlhpblade07 GRIDNODE03]# file -s /dev/mapper/loop9p3 /dev/mapper/loop9p3: LVM2 (Linux Logical Volume Manager) , UUID: j2U7KUWen1ePjDvm4hTclZvA5YJyvl9 [root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2This may also be the root or the swap partition
So, in order to make a better guess in finding the root partition, let's see what the sizes are:
[root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p2 Disk /dev/mapper/loop9p2: 13.8 GB, 13851371520 bytes 255 heads, 63 sectors/track, 1684 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/mapper/loop9p2 doesn't contain a valid partition table [root@nlhpblade07 GRIDNODE03]# fdisk -l /dev/mapper/loop9p3 Disk /dev/mapper/loop9p3: 5362 MB, 5362882560 bytes 255 heads, 63 sectors/track, 652 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/mapper/loop9p3 doesn't contain a valid partition tableAs we can see, one partition is 5GB and the other is 13GB. Best guess would be, the 5GB partion is the swap and the 13GB partition the OS.
With the command vgscan we can scan the newly 'discovered' 'disks' and search for volume groups on them:
[root@nlhpblade07 GRIDNODE03]# vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2vgdisplay says we have one volume group (VolGroup00):
[root@nlhpblade07 GRIDNODE03]# vgdisplay --- Volume group --- VG Name VolGroup00 System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 17.84 GB PE Size 32.00 MB Total PE 571 Alloc PE / Size 571 / 17.84 GB Free PE / Size 0 / 0 VG UUID kmhYBm-Mpbv-usx2-vDur-rEVb-uP4i-kcP4fcWith the command, vgchange -a we can make logical volumes available to use for the kernel.
[root@nlhpblade07 GRIDNODE03]# vgchange -a y VolGroup00 2 logical volume(s) in volume group "VolGroup00" now active [root@nlhpblade07 GRIDNODE03]# lvdisplaylvdisplay can be use to see to see the attributes of a logical volume:
[root@nlhpblade07 GRIDNODE03]# lvdisplay --- Logical volume --- LV Name /dev/VolGroup00/LogVol00 VG Name VolGroup00 LV UUID B13hk3-f5qY-3gDY-Ackt-13gK-DZDc-cTWx3V LV Write Access read/write LV Status available # open 0 LV Size 14.72 GB Current LE 471 Segments 2 Allocation inherit Read ahead sectors 0 Block device 253:3 --- Logical volume --- LV Name /dev/VolGroup00/LogVol01 VG Name VolGroup00 LV UUID iEO4oG-XPMU-syWF-qupo-811i-G6Gg-QZEw5f LV Write Access read/write LV Status available # open 0 LV Size 3.12 GB Current LE 100 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:4So, now we found, (and made available to the logical volume) the root filesystem where the VM is on. Now we can mount that:
[root@nlhpblade07 GRIDNODE03]# mkdir guest_local_LogVol00; [root@nlhpblade07 GRIDNODE03]# mount /dev/VolGroup00/LogVol00 guest_local_LogVol00See the contents of the filesystem:
[root@nlhpblade07 GRIDNODE03]# cd guest_local_LogVol00/ [root@nlhpblade07 guest_local_LogVol00]# ls -la total 224 drwxr-xr-x 26 root root 4096 Jan 14 2009 . drwxr-xr-x 3 root root 4096 Oct 22 22:30 .. -rw-r--r-- 1 root root 0 Jul 24 05:02 .autorelabel drwxr-xr-x 2 root root 4096 Dec 20 2008 bin drwxr-xr-x 2 root root 4096 Jun 6 11:26 boot drwxr-xr-x 4 root root 4096 Jun 6 11:26 dev drwxr-xr-x 94 root root 12288 Jan 14 2009 etc drwxr-xr-x 3 root root 4096 Jun 6 11:50 home drwxr-xr-x 14 root root 4096 Dec 20 2008 lib drwx------ 2 root root 16384 Jun 6 11:26 lost+found drwxr-xr-x 2 root root 4096 Apr 21 2008 media drwxr-xr-x 2 root root 4096 May 22 09:51 misc drwxr-xr-x 3 root root 4096 Dec 20 2008 mnt dr-xr-xr-x 2 root root 4096 Jun 10 11:11 net drwxr-xr-x 3 root root 4096 Aug 21 04:11 opt -rw-r--r-- 1 root root 0 Jan 14 2009 poweroff drwxr-xr-x 2 root root 4096 Jun 6 11:26 proc drwxr-x--- 17 root root 4096 Jan 13 2009 root drwxr-xr-x 2 root root 12288 Dec 20 2008 sbin drwxr-xr-x 4 500 500 4096 Jan 14 2009 scratch drwxr-xr-x 2 root root 4096 Jun 6 11:26 selinux drwxr-xr-x 2 root root 4096 Apr 21 2008 srv drwxr-xr-x 2 root root 4096 Jun 6 11:26 sys drwxr-xr-x 3 root root 4096 Jun 6 11:33 tftpboot drwxrwxrwt 9 root root 4096 Jan 14 2009 tmp drwxr-xr-x 3 root root 4096 Dec 20 2008 u01 drwxr-xr-x 14 root root 4096 Jun 6 11:31 usr drwxr-xr-x 21 root root 4096 Jun 6 11:37 varAs this seems a rather easy way to mount a vm image file, it is still not something you will do very quickly for 40 VM images very quickly.
For this reason, the described solution is scripted and called mount_vm.sh. This is how it works:
[root@nlhpblade07 GRIDNODE05]# mount_vm.sh GRIDNODE05 Starting mount... contents /etc/sysconfig/network -file of mounted node: NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=gridnode05.nl.oracle.com Generating unmount script To unmount your image run /tmp/umount_GRIDNODE05.30992.sh as root Mounting finished...As you can see the images is mounted my a script and a script to unmount is automatically generated. In order to verify the right image file is mounted the contents of the file /etc/sysconfig/network is shown.
'changing a vm' (Chapter 8)
Now the vm image is mounted to the filesystem, we can go back to the generated config files. From here it is easy to copy over all specific configuration files to the vm. Better, would be to make a script available to do this, and that is done for this solution:
[root@nlhpblade07 GRIDNODE05]# change_vm.sh GRIDNODE05 If you are sure, hit Y or y to continue Y Continuing... Starting config change for VM GRIDNODE05 on nlhpblade07... Copying swingbench... Changing ownership of swingbench files... Copying FCF-Java Demo... Changing ownership of FCF-Java Demo files... This vm requires pre-build file /OVS/sharedDisk/rdbms_home_11r1_01_ocfs.img as shared Oracle RDBMS HOME Finished changing config...Now the VM is modified internally and still mounted, the unmount has to be done. This can be done by running the generated unmount script. This script was generated during the mount.
[root@nlhpblade07 GRIDNODE05]# /tmp/umount_GRIDNODE05.30992.sh Unmount finished If the unmount succeeded, you can remove this file rm: remove regular file `/tmp/umount_GRIDNODE05.30992.sh'? y
All Together (Chapter 9)
In essence the procedure described above should be repeated for each VM you want to clone and change on each blade. This may already save you hours of work and reduces chances on mistakes, but still may seem a lot of steps. In order to repeat this for each blade, for each vm, from here it is just a matter of scripting.
So, you could make a script, that for each blade, would do the following:
Pseudo: for each blade in blade list do Stop all vms first while vms still running do wait 10 seconds done Restore all machines (from NFS) Clone conf Mount and change all vms Start all doneFor all 42 VM images the implemented version of the script above runs about four hours. After this a complete 42 node education environment is setup.
Extra Options (Chapter 10)
Besides changing only configuration settings on a VM as mentioned in the Change a VM chapter, other activities can also be done.
For this solution the following options are also implemented.
- configure vnc
- configure ocfs2 within the guest, use a shared Oracle home to save space
- copy and configure software (like an oracle home or swingbench)
- create ASM disk files
- create OCR and Voting disks
- configure sudoers
- provide software for EM Agent Deployment
- configure ssh
Rene Kundersma Oracle Expert Services, The Netherlands