Tuesday Jan 22, 2013

oracle vm 3.2.1 released!

Pleased to announce the release of Oracle VM 3.2.1

The press release is here. The documentation library can be found here.

The release notes in the documentation show what's new and also a list of bugs fixed. Here's the summary of what's new :

The new features and enhancements in Oracle VM Release 3.2.1 include:

Performance, Scalability and Security

Support for Oracle VM Server for SPARC: Oracle VM Manager can now be used to discover SPARC servers running Oracle VM Server for SPARC, and perform virtual machine management tasks.

New Dom0 Kernel in Oracle VM Server for x86: The Dom0 kernel in Oracle VM Server for x86 has been updated so that it is now the same Oracle Unbreakable Enterprise Kernel 2 (UEK2) as used in Oracle Linux, for complete binary compatibility with drivers supported in Oracle Linux. Due to the specialized nature of the Oracle VM Dom0 environment (as opposed to the more general purpose Oracle Linux environment) some Linux drivers may not be appropriate to support in the context of Oracle VM, even if the driver is fully compatible with the UEK2 kernel in Oracle Linux. Do not install any additional drivers unless directed to do so by Oracle Support Services.

Installation

MySQL Database Support: MySQL Database is used as the bundled database for the Oracle VM Manager management repository for simple installations. Support for an existing Oracle SE/EE Database is still included within the installer so that you can perform a custom installation to take advantage of your existing infrastructure. Simple installation using the bundled MySQL Database is fully supported within production environments.

Discontinued inclusion of Oracle XE Databases: Oracle VM Manager no longer bundles the Oracle XE database as a backend database. If you are currently running Oracle VM Manager using Oracle XE and you intend to upgrade you must first migrate your database to Oracle SE or Oracle EE.

Oracle VM Server Support Tools: A meta-package is provided on the Oracle VM Server ISO enabling you to install packages to assist with support. These packages are not installed automatically as they are Oracle VM Server does not depend on them. Installation of the meta-package and its dependencies may assist with the resolution of support queries and can be installed at your own discretion. Note that the sudo package was previously installed as a dependency for Oracle VM Server, but that this package has now been made a dependency of the ovs-support-tools meta-package. If you require sudo on your Oracle VM Server installations, you should install the ovs-support-tools meta-package.

Improved Usability

Oracle VM Command Line Interface (CLI): The new Oracle VM Command Line Interface can be used to perform the same functions as the Oracle VM Manager Web Interface, such as managing all your server pools, servers and guests. The CLI commands can be scripted and run in conjunction with the Web Interface, thus bringing more flexibility to help you deploy and manage an Oracle VM environment. The CLI supports public-key authentication, allowing users to write scripts without embedding passwords, to facilitate secure remote login to Oracle VM Manager. The CLI also includes a full audit log for all commands executed using the facility. See the Oracle VM Command Line Interface User's Guide for information on using the CLI.

Accessibility options: Options to display the UI in a more accessible way for screen readers, improve the contrast, or increase the font size. See Oracle VM Manager user interface Accessibility Features for more information.

Health tab: Monitor the overall health and status of your virtualization environment and view historical statistics such as memory and CPU usage. See Health Tab for information on using the Health tab.

Multi-select of objects: Select one or more objects to perform an action on multiple objects, for example, upgrading multiple Oracle VM Servers in one step, rather than upgrading them individually. See Multi-Select Functionality for information on using the multi-select feature.

Search for objects: In many of the tab management panes and in some of the dialog boxes you can search for objects. This is of particular benefit to large deployments with many objects such as virtual machines or Oracle VM Servers. See Name Filters for information on using the search feature.

Tagging of objects: It is now possible to tag virtual machines, servers and server pool objects within Oracle VM Manager to create logical groupings of items, making it easier to search for objects by tag.

Alphabetized tables and other UI listings: Items listed in tables and other UI listings are now sorted alphabetically within Oracle VM Manager by default, to make it easier to find objects in larger deployments.

Present repository to server pools: In addition to presenting a storage repository to individual Oracle VM Servers, you can now present a repository to all Oracle VM Servers in one or more server pools. See Presenting or Unpresenting a Storage Repository for more information.

OCFS2 timout configuration: An additional attribute has been added to allow you to determine the timout in seconds for a cluster when configuring a clustered server pool within Oracle VM Manager.

NFS refresh servers and access lists for non-uniform exports: For NFS configurations where different server pools are exposed to different exports, it is now possible to configure non-uniform exports and access lists to control how server pool refreshes are performed. For more information on this feature, please see NFS Access Groups for Non-uniform Exports.

Configure multiple iSCSI access hosts: You can now configure multiple access hosts for iSCSI storage devices

Sizes of disks, ISOs and vdisks: Oracle VM Manager now shows the sizes of disks, ISOs and vdisks within the virtual machine edit dialog, to make it easier to select a disk.

Automated backups and easy restore: Oracle VM Manager installations taking advantage of the bundled MySQL Enterprise Edition Database include fully automated database backups and a quick restore tool that can help with easy database restoration.

Serial console access: A serial console java applet has been included within Oracle VM Manager to allow serial console access to virtual machines running on both SPARC and x86 hardware. This facility complements the existing VNC-based console access to virtual machines running on x86 hardware.

Set preferences for recurring jobs: Facilities have been provided within Oracle VM Manager to control the preferences for recurring jobs. These include the ability to enable, disable or set the interval for tasks such as refreshing repositories and file systems; and to control the Yum Update checking task.

Processor Compatibility Groups: Since virtual machines can only be migrated between servers that use compatible processor types, Oracle VM Manager now provides the ability to define Processor Compatibility Groups to enable you to pick which servers a virtual machine can be migrated between.

Configure additional Utility and Virtual Machine roles: New roles are now supported on Oracle VM Servers to control the type of functionality that the server will be responsible for. The Virtual Machine role is required in order for an Oracle VM Server to run a virtual machine. Oracle VM Servers configured with the Utility role are favoured for performing operations such as file cloning, importing of templates, the creation of repositories, and other operations not directly related to running a virtual machine.

Directly import a virtual machine: It is now possible to directly import a virtual machine using Oracle VM Manager, no longer requiring that you first import to a template and then clone.

Virtual machine start policy: You can now specify a start policy for a virtual machine, determining whether to always start the virtual machine on the server on which it has been placed, or to start the virtual machine on the best possible server in the server pool.

Hot-add a VNIC to a virtual machine: It is now possible to add a VNIC directly to a running virtual machine from within Oracle VM Manager.

Send messages to a virtual machine: Facilities have been provided within Oracle VM Manager to send messages directly to a virtual machine in the form of key-value pairs.

NTP configuration: Ensuring that time is synchronized across all servers is important. Oracle VM Manager now provides a facility to bulk configure NTP across all servers.


My personal favorites are (1) MySQL as a repository database (2) adding support for SPARC servers running Oracle VM for SPARC in Oracle VM Manager (3) the CLI server (4) Server Utility versus VM server roles (5) cluster timeout configuration (and a better default) (6) direct VM import and (7) serial console for a VM.

have fun

Monday Jan 21, 2013

oracle linux playground channel sample

If you have a system with Oracle Linux 6 installed but you are not using public-yum, and you want to play with our mainline kernel builds from the playground channel, then you need to create a simple, small yum repo file and you are all set.

Some reasons could be that your system is configured for a local yum repository for updates, or you are registered directly with ULN.

Either way, a very simple example file can be found here. Just put the file in /etc/yum.repos.d.

# cat /etc/yum.repos.d/playground.repo 
[ol6_playground]
name=Oracle Linux mainline kernel playground $releasever ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/playground/latest/$basearch/
gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
gpgcheck=1
enabled=1

Once this file exists, you can use yum to install the new kernels. At time of writing, this is kernel-3.7.2-3.7.y.20130115.ol6.x86_64. Just go look in the directory to see which kernels have been published and pick the one you want to install. As you can see source, binary, devel, debug, headers, firmware and doc versions of the packages are there.

# yum install kernel-3.7.2-3.7.y.20130115.ol6.x86_64
Loaded plugins: refresh-packagekit, rhnplugin, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:3.7.2-3.7.y.20130115.ol6 will be installed
--> Processing Dependency: kernel-firmware = 3.7.2-3.7.y.20130115.ol6 
      for package: kernel-3.7.2-3.7.y.20130115.ol6.x86_64
--> Running transaction check
---> Package kernel-firmware.noarch 0:2.6.32-279.19.1.el6 will be updated
---> Package kernel-firmware.noarch 0:3.7.2-3.7.y.20130115.ol6 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package           Arch     Version                      Repository        Size
================================================================================
Installing:
 kernel            x86_64   3.7.2-3.7.y.20130115.ol6     ol6_playground    24 M
Updating for dependencies:
 kernel-firmware   noarch   3.7.2-3.7.y.20130115.ol6     ol6_playground   997 k

Transaction Summary
================================================================================
Install       1 Package(s)
Upgrade       1 Package(s)

Total download size: 25 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): kernel-3.7.2-3.7.y.20130115.ol6.x86_64.rpm  
                                      |  24 MB     00:18     
(2/2): kernel-firmware-3.7.2-3.7.y.20130115.ol6.noarch.rpm   
                                      | 997 kB     00:00     
--------------------------------------------------------------
Total                                                             
                             1.3 MB/s |  25 MB     00:19     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Updating   : kernel-firmware-3.7.2-3.7.y.20130115.ol6.noarch            
                                1/3 
  Installing : kernel-3.7.2-3.7.y.20130115.ol6.x86_64                      
                                2/3 
  Cleanup    : kernel-firmware-2.6.32-279.19.1.el6.noarch                   
                                3/3 
  Verifying  : kernel-firmware-3.7.2-3.7.y.20130115.ol6.noarch                     
                                1/3 
  Verifying  : kernel-3.7.2-3.7.y.20130115.ol6.x86_64                             
                                2/3 
  Verifying  : kernel-firmware-2.6.32-279.19.1.el6.noarch                          
                                3/3 

Installed:
  kernel.x86_64 0:3.7.2-3.7.y.20130115.ol6                                                                    

Dependency Updated:
  kernel-firmware.noarch 0:3.7.2-3.7.y.20130115.ol6                                                           

Complete!
Now just a simple reboot and you are all set.

Wednesday Jan 16, 2013

Oracle Linux 5.9

Oracle Linux 5.9 was uploaded yesterday to http://linux.oracle.com (ULN) and to http://public-yum.oracle.com. The _latest channels are current and the 5.9_base channels contain the core.

ISO images will be available shortly from http://edelivery.oracle.com. If there is an urgent need to get the ISOs through My Oracle Support, simply file a service request.

Release notes are here.

Sunday Jan 06, 2013

oracle vm template config script example

The programmatic way to extend Oracle VM Template Configure is to build your own module.

To write your own module, you have to build an RPM that contains a configure script in a specific format, let's go through the steps to do this.

Oracle VM template configure works very similar to the init.d and chkconfig script model. For template config we have the /etc/template.d directory, all the scripts go into /etc/template.d/scripts. Then symlinks are made to other subdirectories based on the type of target the scripts provide. At this point we handle configure and cleanup. When a script/module gets added using ovm-chkconfig, the header of the script is read to verify the name, priority and targets and then a symlink is made to the corresponding subdirectories under /etc/template.d.

As an example, you have /etc/init.d/sshd which is the main sshd initscript and when sshd is enabled you will find a symlink in /etc/rc3.d/S55sshd to /etc/init.d/sshd. These symlinks are created by chkconfig when you enable or disable a service. The same thing goes for Oracle VM template config and the content of /etc/template.d/scripts. You will see /etc/template.d/scripts/ssh and since ssh (on my system) is enabled for the configure target, I have a symlink to /etc/template.d/configure.d/70ssh.

Like init.d, the digit in front of the script name specifies the priority at which it should be run.

The most important and complex part is writing your own script for your own application. Our scripts are in python, theoretically you could write it in a different language, as long as the input, output and argument handling remains the same. The examples here will all be in python. Each script has 2 main part : (1) the script header which contains information like script name, targets, priorities and description and (2) the actual script which has to handle a small set of parameters. You can take a look at the existing scripts for examples.

(1) script header
Aside from a copyright header that suits your needs, the script headers require a very specific comment block, here is an example :

### BEGIN PLUGIN INFO
# name: network
# configure: 50
# cleanup: 50
# description: Script to configure template network.
### END PLUGIN INFO

You have to use the exact same format. Provide your own script name, which will be used when calling ovm-chkconfig, the targets (right now we implement configure and cleanup) and the priority for your script. The priority will specify in what order the scripts get executed. You do not have to implement all targets, if you have a configure target but not cleanup, that is OK, same goes for cleanup versus configure. It is up to you. The configure target gets called when a first boot/initial start of the VM happens, cleanup happens when you manually initiate a cleanup in your VM or when you want to restore the VM to its original state.

### BEGIN PLUGIN INFO
# name: [script name]
# [target]: [priority]
# [target]: [priority]
# description: a description and can
#   cross multiple lines.
### END PLUGIN INFO

Now for the body of the script. Basically the main requirement is that it accepts a [target] parameter. Let's say we have script called foo that needs to be run at configure time, then the script (/etc/template.d/scripts) will have to accept and understand handling the parameter configure. If you also want to call it for cleanup, then it has to handle cleanup. You can have your script handle any other arguments, this is totally up to you, they are optional for our purposes. There is one optional parameter which is useful to implement and this is -e or --enumerate. ovm-template-config uses this to be able to enumerate the parameters for a target for your script.

Here is the firewall example:

# ovm-template-config --human-readable --enumerate configure --script firewall
[('41',
  'firewall',
  [{u'description': u'Whether to enable network firewall: True or False.',
    u'hidden': True,
    u'key': u'com.oracle.linux.network.firewall'}])]
and if you run the script manually :

# ./firewall configure -e
[{"hidden": true, "description": "Whether to enable network firewall: True or False.", "key": "com.oracle.linux.network.firewall"}]

In other words, the firewall script lists the parameters it expects when run as a configure target.

Now here is an example of the script body, in python. It implements the configure and cleanup target and handles the enumerate argument. Part of the magic is handled in templateconfig.cli.

try:
    import json
except ImportError:
    import simplejson as json
from templateconfig.cli import main


def do_enumerate(target):
    param = []
    if target == 'configure':
        param += []
    elif target == 'cleanup':
        param += []
    return json.dumps(param)


def do_configure(param):
    param = json.loads(param)
    return json.dumps(param)


def do_unconfigure(param):
    param = json.loads(param)
    return json.dumps(param)


def do_cleanup(param):
    param = json.loads(param)
    return json.dumps(param)


if __name__ == '__main__':
    main(do_enumerate, {'configure': do_configure, 'cleanup': do_cleanup})

So now you can fill this out with your own parameters and code. Again taking the firewall script as an example, to add expected keys :

def do_enumerate(target):
    param = []
    if target == 'configure':
        param += [{'key': 'com.oracle.linux.network.firewall',
                   'description': 'Whether to enable network firewall: True or False.',
                   'hidden': True}]
    return json.dumps(param)

The above shows that this script expect the key com.oracle.linux.firewall to be set and what the default is, along with a description. Add this for each key/value pair that you expect for your script and then afterwards it is easy to understand what the input to your script needs to be, again by running ovm-template-config.

To execute actions at configure time, based on values set, here's a do_configure() example:

def do_configure(param):
    param = json.loads(param)
    firewall = param.get('com.oracle.linux.network.firewall')
    if firewall == 'True':
        shell_cmd('service iptables start')
        shell_cmd('service ip6tables start')
        shell_cmd('chkconfig --level 2345 iptables on')
        shell_cmd('chkconfig --level 2345 ip6tables on')
    elif firewall == 'False':
        shell_cmd('service iptables stop')
        shell_cmd('service ip6tables stop')
        shell_cmd('chkconfig --level 2345 iptables off')
        shell_cmd('chkconfig --level 2345 ip6tables off')
    return json.dumps(param)

When the script is called, you can use param.get() to retrieve key/value variables and then just make use of it. Just like in the firewall example, you can do whatever you want, call out other commands, add more python code, it's up to you...

It is also possible to alter keys or add new keys which then get sent back. So if you want your script to communicate values back which can be retrieved later through the manager API, for instance with ovm_vmmessage -q, you can simply to this :

param['key'] = 'some value'

Key can be an existing key, or a new one.

And that's really it... for the script. Next up is packaging.

In order to install and configure these template configure scripts, they have to be packaged in an RPM, with a specific naming convention. Package the script(s), there can be more than one, as ovm-template-config-[scriptname]. Ideally in the post install of the RPM you want to add the script automatically. Execute # /usr/sbin/ovm-chkconfig --add [scriptname]. When de-installing a script/RPM, remove it at un-install time, # /usr/sbin/ovm-chkconfig --del [scriptname].

Here is an example of an RPM spec file that can be used:

Name: ovm-template-config-example
Version: 3.0
Release: 1%{?dist}
Summary: Oracle VM template example configuration script.
Group: Applications/System
License: GPL
URL: http://www.oracle.com/virtualization
Source0: %{name}-%{version}.tar.gz
BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)
BuildArch: noarch
Requires: ovm-template-config

%description
Oracle VM template example configuration script.

%prep
%setup -q

%install
rm -rf $RPM_BUILD_ROOT
make install DESTDIR=$RPM_BUILD_ROOT

%clean
rm -rf $RPM_BUILD_ROOT

%post
if [ $1 = 1 ]; then
    /usr/sbin/ovm-chkconfig --add example
fi

%preun
if [ $1 = 0 ]; then
    /usr/sbin/ovm-chkconfig --del example
fi

%files
%defattr(-,root,root,-)
%{_sysconfdir}/template.d/scripts/example

%changelog
* Tue Mar 22 2011 Zhigang Wang  - 3.0-1
- Initial build.

Modify the content to your liking, change the name example to your script name, and add whatever else dependencies you might have or whatever files need to be bundled along with this. If you want to bundle executables or scripts that live in other locations, that's allowed. As you can see from the spec file, it automatically called ovm-chkconfig --add and --del at post-install and pre-uninstall time of the RPM.

In order to create RPMs, you have to install rpmbuild, # yum install rpm-build.

To make it easy, here's a Makefile you can use and help automate all of this :

DESTDIR=
PACKAGE=ovm-template-config-example
VERSION=3.0

help:
	@echo 'Commonly used make targets:'
	@echo '  install    - install program'
	@echo '  dist       - create a source tarball'
	@echo '  rpm        - build RPM packages'
	@echo '  clean      - remove files created by other targets'

dist: clean
	mkdir $(PACKAGE)-$(VERSION)
	tar -cSp --to-stdout --exclude .svn --exclude .hg --exclude .hgignore \
	    --exclude $(PACKAGE)-$(VERSION) * | tar -x -C $(PACKAGE)-$(VERSION)
	tar -czSpf $(PACKAGE)-$(VERSION).tar.gz $(PACKAGE)-$(VERSION)
	rm -rf $(PACKAGE)-$(VERSION)

install:
	install -D example $(DESTDIR)/etc/template.d/scripts/example

rpm: dist
	rpmbuild -ta $(PACKAGE)-$(VERSION).tar.gz

clean:
	rm -fr $(PACKAGE)-$(VERSION)
	find . -name '*.py[cdo]' -exec rm -f '{}' ';'
	rm -f *.tar.gz

.PHONY: dist install rpm clean

Create a directory, copy over your script, the spec file and this Makefile. Run # make dist, to create a src tarball of your code and then # make rpm. This will generate an RPM in the RPMS/noarch directory. For instance: /root/rpmbuild/RPMS/noarch/ovm-template-config-test-3.0-1.el6.noarch.rpm

Next you can take this RPM and install it on a target system.

# rpm -ivh  /root/rpmbuild/RPMS/noarch/ovm-template-config-test-3.0-1.el6.noarch.rpm
Preparing...                ########################################### [100%]
   1:ovm-template-config-tes########################################### [100%]

And as you can see, it's added to the ovm-chkconfig list :

# ovm-chkconfig --list|grep testtest                 on:75       
off         off         on:25       off         off         off         off        

One point of caution : the configure scripts get executed very early on in the bootstage. ovmd is executed as S00ovmd. This is well before many other services are (1) configured, (2) running. So if your product requires services like network connectivity or others to be up and running, then you have to split up the configuration into two parts. First, use the above to gather configuration data remotely, store it in a way that you can use it, and then add your own /etc/init.d scripts which can take this data afterwards. So you can have your own init scripts executed at a late stage when the services you depend on are available.

That's really all there is to it. Thanks to Zhigang for example code I have used here.

oracle vm messages

Using the Oracle VM Message API for your own applications...

There are two ways to communicate through the APIs, a quick and easy one and a more comprehensive one.

The quick and easy method of just sending and receiving messages.

  • Sending messages using ovm_utils or using the Oracle VM CLI to the VM
  • # ssh admin@localhost -p 10000
    admin@localhost's password: 
    OVM> sendVmMessage Vm name=ol6u3apitest key=foo message=bar log=no
    Command: sendVmMessage Vm name=ol6u3apitest key=foo message=bar log=no
    Status: Success
    Time: 2012-12-27 09:04:29,890 PST
    

    or

    # ./ovm_vmmessage -u admin -p ######## -h localhost -v ol6u3apitest -k foo -V bar
    Oracle VM VM Message utility 0.5.2.
    Connected.
    VM : 'ol6u3apitest' has status :  Running.
    Sending message.
    Message sent successfully.
    

    so both of the examples send a key/value pair of foo=bar to the VM.

  • Receiving messages on the VM side using ovmd
  • Inside the VM, you can use the ovmd executable to retrieve and send messages back.

    # ovmd -l 
    

    lists all currently set key/value pairs

    # ovmd -g key 
    
    get a value from inside the VM

    # ovmd -r key
    
    delete a key out of the current cache

    # ovmd -x
    
    delete the key/value values currently set in the cache

    so in the case of the message sent through the manager above, you can see it in the VM :

    # ovmd -l
    {"foo":"bar"}
    {"mykey":"myvalue"}
    
    # ovmd -g mykey
    myvalue
    
    # ovmd -r mykey
    # ovmd -l
    {"foo":"bar"}
    

  • Setting key/value pairs inside the VM and retrieve through the manager
  • # ovmd -p key=value
    
    set a key/value pair inside the VM

    # ovmd -p newkey=newvalue
    # ovmd -l
    {"foo":"bar"}
    {"newkey":"newvalue"}
    

    use ovm_vmmessage to query the value of a key, use the -q option with the key to query to retrieve a set value.

    ovm_vmmessage will also return when this key was set inside the VM.

    # ./ovm_vmmessage -u admin -p Manager1 -h localhost -v ol6u3apitest -q newkey
    Oracle VM VM Message utility 0.5.2.
    Connected.
    VM : 'ol6u3apitest' has status :  Running.
    Querying for key 'newkey'.
    Query successful.
    Query for Key : 'newkey' returned value 'newvalue'.
    Key set 2 minutes ago.
    

    So with just these simple tools it's possible to set up a model where you send message from your application outside of a VM to the VM through the OVMAPI and also send messages from an application inside a VM back. You can write your own daemon process that runs and queries for values or just do it manually. A recommendation would be to create a naming convention for your product. For instance, for the Oracle VM template configuration we use com.oracle.linux.[values]. You could consider something similar or just have [application].[key] or whatever you want. The maximum size of the total message is 8Kb.

    Next up will be extending template config for your own application.

    Saturday Jan 05, 2013

    Oracle Linux Ksplice offline client

    We just uploaded the Ksplice uptrack Offline edition client to ULN. Until recently, in order to be able to rely on Ksplice zero downtime patches, you know, the ability to apply security updates and bugfixes on Oracle Linux without the need for a reboot, each server made a direct connection to our server. It was required for each server on the intranet to have a direct connection to linux.oracle.com.

    By introducing the offline client, customers with Oracle Linux Premier or Oracle Linux Premier Limited support can create a local intranet yum repository that creates a local mirror of the ULN ksplice channel and just use the yum update command to install the latest ksplice updates. This will allow customers to have just one server connected to the oracle server and every other system just have a local connection.

    Here is an example on how to get this going, and then setting up a local yum repository is an exercise up to the reader :)

  • Register a server with ULN and have premier Oracle Linux support. (using # uln_register)
  • Configure rebootless updates with # uln_register
  • Log into ULN and modify the channel subscriptions for the server to include the channel Ksplice for Oracle Linux 6
  • Verify that you are subscribed by running # yum repolist on your server and you should see the ol6_x86_64_ksplice channel included
  • Install the offline edition of uptrack, the Ksplice update client # yum install uptrack-offline
  • If there are ksplice updates available for the kernel you run, you can install the uptrack-updates-[version] rpm. # yum install uptrack-updates-`uname -r`.(i386|x86_64)

    Now, as we release new zero downtime updates, there will be a newer version of the uptrack-updates RPM for your kernel. A simple # yum update will pick this up, (without the need for a reboot) and the update will apply new ksplice updates. You can see the version of the uptrack-updates by doing # rpm -qa | grep uptrack-updates it will show the date appended at the end. And of course the usual commands # uptrack-show or # uptrack-uname -r.

    The offline client allows you to create a local yum repository for your servers that are covered under Oracle Linux Premier and Oracle Linux Premier Limited support subscriptions so that the servers just point to your local yum repository and all you have to do is keep this repository up-to-date for the ksplice channel and install the above RPMs on each server and run a yum update. In other words, the offline client doesn't require each server to be connected to the Oracle Unbreakable Linux network.

    To set up a local yum repository, follow these instructions.

    have fun!

  • Using Oracle VM messages to configure a Virtual Machine.

    In the previous blog entry, I walked through the steps on how to set up a VM with the necessary packages to enable Oracle VM template configuration. The template configuration scripts are add-ons one can install inside a VM running in an Oracle VM 3 environment. Once installed, it is possible to enable the configuration scripts and shutdown the VM so that after cloning or reboot, we go through an initial setup dialog.

    At startup time, if ovmd is enabled, it will start executing configuration scripts that need input to configure and continue. It is possible to send this configuration data through the virtual console of the VM or through the Oracle VM API. To use the Oracle VM API to send configuration messages, you have two options :

    (1) use the Oracle VM CLI. As of Oracle VM 3.1, we include an Oracle VM CLI server by default when installing Oracle VM Manager. This process starts on port 10000 on the Oracle VM Manager node and acts as an ssh server. You can log into this cli using the admin username/password and then execute cli commands.

    # ssh admin@localhost -p 10000
    admin@localhost's password: 
    OVM> sendVmMessage Vm name=ol6u3apitest key=foo message=bar log=no
    Command: sendVmMessage Vm name=ol6u3apitest key=foo message=bar log=no
    Status: Success
    Time: 2012-12-27 09:04:29,890 PST
    

    The cli command for sending a message is sendVmMessage Vm name=[vmname] key=[key] message=[value]

    If you do not want to log the out of the commands then add log=no

    (2) use the Oracle VM utilities. If you install the Oracle VM Utilities, see here to get started, then :

    # ./ovm_vmmessage -u admin -p ######## -h localhost -v ol6u3apitest -k foo -V bar
    Oracle VM VM Message utility 0.5.2.
    Connected.
    VM : 'ol6u3apitest' has status :  Running.
    Sending message.
    Message sent successfully.
    

    The ovm_vmmessage command connects to Oracle VM Manager and sends a key/value pair to the VM you select.

    ovm_vmmessage -u [adminuser] -p [adminpassword] -h [managernode] -v [vmname] -k [key] -V [value]

    These two commands basically allow the admin user to send simple key - value pair messages to a given VM. This is the basic mechanism we rely on to remotely configure a VM using the Oracle VM template config scripts.

    For the template configuration we provide, and depending on the scripts you installed, there is a well-defined set of variables (keys) that you can set, listed below. In our scripts we have one variable that is required and this has to be set/send at the end of the configuration. This is configuring the root password. Everything else is optional. Sending the root password variable triggers the reconfiguration to execute. As an example, if you install the ovm-template-config-selinux package, then part of the configuration can be to set the selinux mode. The variable is com.oracle.linux.selinux.mode and the values can be enforcing,permissive or disabled. So to set the value of SELinux, you basically send a message with key com.oracle.linux.selinux.mode and value enforcing (or so..).

    # ./ovm_vmmessage -u admin -p ######## -h localhost -v ol6u3apitest \
            -k com.oracle.linux.selinux.mode -V enforcing
    

    Do this for every variable you want to define and at the end send the root password.

    # ./ovm_vmmessage -u admin -p ######## -h localhost -v ol6u3apitest \ 
            -k com.oracle.linux.root-password -V "mypassword"
    

    Once the above message gets sent, the ovm-template-config scripts will set up all the values and the VM will end up in a configured state. You can use this to send ssh keys, set up extra users, configure the virtual network devices etc.. To get the list of configuration variables just run # ovm-template-config --human-readable --enumerate configure and it will list the variables with a description like below.

    It is also possible to selectively enable and disable scripts. This work very similar to chk-config. # ovm-chkconfig --list will show which scripts/modules are registered and whether they are enabled to run at configure time and/or cleanup time. At this point, the other options are not implemented (suspend/resume/..). If you have installed datetime but do not want to have it run or be an option, then a simple # ovm-chkconfig --target configure datetime off will disable it. This allows you, for each VM or template, to selectively enable or disable configuration options. If you disable a module then the output of ovm-template-config will reflect those changes.

    The next blog entry will talk about how to make generic use of the VM message API and possible extend the ovm-template-configure modules for your own applications.

    [('30',
      'selinux',
      [{u'description': u'SELinux mode: enforcing, permissive or disabled.',
        u'hidden': True,
        u'key': u'com.oracle.linux.selinux.mode'}]),
     ('41',
      'firewall',
      [{u'description': u'Whether to enable network firewall: True or False.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.firewall'}]),
     ('50',
      'datetime',
      [{u'description': u'System date and time in format year-month-day-hour-minute-second, e.g., "2011-4-7-9-2-42".',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.datetime'},
       {u'description': u'System time zone, e.g., "America/New_York".',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.timezone'},
       {u'description': u'Whether to keep hardware clock in UTC: True or False.',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.utc'},
       {u'description': u'Whether to enable NTP service: True or False.',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.ntp'},
       {u'description': u'NTP servers separated by comma, e.g., "time.example.com,0.example.pool.ntp.org".',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.ntp-servers'},
       {u'description': u'Whether to enable NTP local time source: True or False.',
        u'hidden': True,
        u'key': u'com.oracle.linux.datetime.ntp-local-time-source'}]),
     ('50',
      'network',
      [{u'description': u'System host name, e.g., "localhost.localdomain".',
        u'key': u'com.oracle.linux.network.hostname'},
       {u'description': u'Hostname entry for /etc/hosts, e.g., "127.0.0.1 localhost.localdomain localhost".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.host.0'},
       {u'description': u'Network device to configure, e.g., "eth0".',
        u'key': u'com.oracle.linux.network.device.0'},
       {u'depends': u'com.oracle.linux.network.device.0',
        u'description': u'Network device hardware address, e.g., "00:16:3E:28:0F:4E".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.hwaddr.0'},
       {u'depends': u'com.oracle.linux.network.device.0',
        u'description': u'Network device MTU, e.g., "1500".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.mtu.0'},
       {u'choices': [u'yes', u'no'],
        u'depends': u'com.oracle.linux.network.device.0',
        u'description': u'Activate interface on system boot: yes or no.',
        u'key': u'com.oracle.linux.network.onboot.0'},
       {u'choices': [u'dhcp', u'static'],
        u'depends': u'com.oracle.linux.network.device.0',
        u'description': u'Boot protocol: dhcp or static.',
        u'key': u'com.oracle.linux.network.bootproto.0'},
       {u'depends': u'com.oracle.linux.network.bootproto.0',
        u'description': u'IP address of the interface.',
        u'key': u'com.oracle.linux.network.ipaddr.0',
        u'requires': [u'com.oracle.linux.network.bootproto.0',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.0',
        u'description': u'Netmask of the interface.',
        u'key': u'com.oracle.linux.network.netmask.0',
        u'requires': [u'com.oracle.linux.network.bootproto.0',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.0',
        u'description': u'Gateway IP address.',
        u'key': u'com.oracle.linux.network.gateway.0',
        u'requires': [u'com.oracle.linux.network.bootproto.0',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.0',
        u'description': u'DNS servers separated by comma, e.g., "8.8.8.8,8.8.4.4".',
        u'key': u'com.oracle.linux.network.dns-servers.0',
        u'requires': [u'com.oracle.linux.network.bootproto.0',
                      [u'static', u'none', None]]},
       {u'description': u'DNS search domains separated by comma, e.g., "us.example.com,cn.example.com".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.dns-search-domains.0'},
       {u'description': u'Network device to configure, e.g., "eth0".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.device.1'},
       {u'depends': u'com.oracle.linux.network.device.1',
        u'description': u'Network device hardware address, e.g., "00:16:3E:28:0F:4E".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.hwaddr.1'},
       {u'depends': u'com.oracle.linux.network.device.1',
        u'description': u'Network device MTU, e.g., "1500".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.mtu.1'},
       {u'choices': [u'yes', u'no'],
        u'depends': u'com.oracle.linux.network.device.1',
        u'description': u'Activate interface on system boot: yes or no.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.onboot.1'},
       {u'choices': [u'dhcp', u'static'],
        u'depends': u'com.oracle.linux.network.device.1',
        u'description': u'Boot protocol: dhcp or static.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.bootproto.1'},
       {u'depends': u'com.oracle.linux.network.bootproto.1',
        u'description': u'IP address of the interface.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.ipaddr.1',
        u'requires': [u'com.oracle.linux.network.bootproto.1',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.1',
        u'description': u'Netmask of the interface.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.netmask.1',
        u'requires': [u'com.oracle.linux.network.bootproto.1',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.1',
        u'description': u'Gateway IP address.',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.gateway.1',
        u'requires': [u'com.oracle.linux.network.bootproto.1',
                      [u'static', u'none', None]]},
       {u'depends': u'com.oracle.linux.network.bootproto.1',
        u'description': u'DNS servers separated by comma, e.g., "8.8.8.8,8.8.4.4".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.dns-servers.1',
        u'requires': [u'com.oracle.linux.network.bootproto.1',
                      [u'static', u'none', None]]},
       {u'description': u'DNS search domains separated by comma, e.g., "us.example.com,cn.example.com".',
        u'hidden': True,
        u'key': u'com.oracle.linux.network.dns-search-domains.1'}]),
     ('60',
      'user',
      [{u'description': u'Name of the user on which to perform operation.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.name.0'},
       {u'description': u'Action to perform on the user: add, del or mod.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.action.0'},
       {u'description': u'User ID.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.uid.0'},
       {u'description': u'User initial login group.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.group.0'},
       {u'description': u'Supplementary groups separated by comma.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.groups.0'},
       {u'description': u'User password.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.password.0',
        u'password': True},
       {u'description': u'New name of the user.',
        u'hidden': True,
        u'key': u'com.oracle.linux.user.new-name.0'},
       {u'description': u'Name of the group on which to perform operation.',
        u'hidden': True,
        u'key': u'com.oracle.linux.group.name.0'},
       {u'description': u'Action to perform on the group: add, del or mod.',
        u'hidden': True,
        u'key': u'com.oracle.linux.group.action.0'},
       {u'description': u'Group ID.',
        u'hidden': True,
        u'key': u'com.oracle.linux.group.gid.0'},
       {u'description': u'New name of the group.',
        u'hidden': True,
        u'key': u'com.oracle.linux.group.new-name.0'}]),
     ('70',
      'ssh',
      [{u'description': u'Host private rsa1 key for protocol version 1.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-key'},
       {u'description': u'Host public rsa1 key for protocol version 1.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-key-pub'},
       {u'description': u'Host private rsa key.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-rsa-key'},
       {u'description': u'Host public rsa key.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-rsa-key-pub'},
       {u'description': u'Host private dsa key.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-dsa-key'},
       {u'description': u'Host public dsa key.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.host-dsa-key-pub'},
       {u'description': u'Name of the user to add a key.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.user.0'},
       {u'description': u'Authorized public keys.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.authorized-keys.0'},
       {u'description': u'Private key for authentication.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.private-key.0'},
       {u'description': u'Private key type: rsa, dsa or rsa1.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.private-key-type.0'},
       {u'description': u'Known hosts.',
        u'hidden': True,
        u'key': u'com.oracle.linux.ssh.known-hosts.0'}]),
     ('90',
      'authentication',
      [{u'description': u'System root password.',
        u'key': u'com.oracle.linux.root-password',
        u'password': True,
        u'required': True}])]
    

    Configure Oracle Linux 6.3 as an Oracle VM template

    I have been asked a few times how one can make use of the Oracle VM API to configure an Oracle Linux VM running on top of Oracle VM 3. In the next few blog entries we will go through the various steps. This one will start at the beginning and get you to a completely prepared VM.

  • Create a VM with a default installation of Oracle Linux 6 update 3
  • You can freely download Oracle Linux installation images from http://edelivery.oracle.com/linux. Choose any type of installation you want, basic, desktop, server, minimal...

    Oracle Linux 6.3 comes with kernel 2.6.39-200.24.1 (UEK2)

    # uname -a
    Linux ol6u3 2.6.39-200.24.1.el6uek.x86_64 #1 SMP Sat Jun 23 02:39:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
    

  • Update the VM to the latest version of UEK and in general as a best practice update to the latest patches and reboot the VM
  • Oracle Linux updates are freely available on our public-yum site and the default install of Oracle Linux 6.3 already points to this location for updates.

    # yum update 
    # reboot
    # uname -a
    Linux ol6u3 2.6.39-300.17.3.el6uek.x86_64 #1 SMP Wed Dec 19 06:28:03 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
    

    There is an extra kernel module required for the Oracle VM API to work, the ovmapi kernel module provides the ability to communicate messages back and forth between the host and the VM and as such between Oracle VM Manager, through the VM API to the VM and back. We included this kernel module in the 2.6.39-300 kernel to make it easy. There is no need to install extra kernel modules or keep kernel modules up to date when or if we have a new update. The source code for this kernel module is of course part of the UEK2 source tree.

  • Enable the Oracle Linux add-on channel
  • After reboot, download the latest public-yum repo file from public-yum which contains more repositories and enable the add-on channel which contains the Oracle VM API packages:

    inside the VM :

    # cd /etc/yum.repos.d
    # rm public-yum-ol6.repo    <- (replace the original version with this newer version)
    # wget http://public-yum.oracle.com/public-yum-ol6.repo
    

  • Edit the public-yum-ol6.repo file to enable the ol6_addons channel.
  • Find the ol6_addons section and change enabled=0 to enabled=1.

    [ol6_addons]
    name=Oracle Linux $releasever Add ons ($basearch)
    baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/addons/$basearch/
    gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
    gpgcheck=1
    enabled=1
    

    Save the file.

  • Install the Oracle VM API packages
  • # yum install ovmd xenstoreprovider python-simplejson ovm-template-config
    

    This installs the basic necessary packages on Oracle Linux 6 to support the Oracle VM API. xenstore provider is the library which communicates with the ovmapi kernel infrastructure. ovmd is a daemon that handles configuration and re-configuration events and provides a mechanism to send/receive messages between the VM and the Oracle VM Manager.

  • Add additional configuration packages you want
  • In order to be able to create a VM template that includes basic OS configuration system scripts, you can decide to install any or all of the following :

    ovm-template-config-authentication : Oracle VM template authentication configuration script.
    ovm-template-config-datetime       : Oracle VM template datetime configuration script.
    ovm-template-config-firewall       : Oracle VM template firewall configuration script.
    ovm-template-config-network        : Oracle VM template network configuration script.
    ovm-template-config-selinux        : Oracle VM template selinux configuration script.
    ovm-template-config-ssh            : Oracle VM template ssh configuration script.
    ovm-template-config-system         : Oracle VM template system configuration script.
    ovm-template-config-user           : Oracle VM template user configuration script.
    

    Simply type # yum install ovm-template-config-... to install whichever you want.

  • Enable ovmd
  • To enable ovmd (recommended) do :

    # chkconfig ovmd on 
    # /etc/init.d/ovmd start
    
  • Prepare your VM for first boot configuration
  • If you want to shutdown this VM and enable the first boot configuration as a template, execute :

    # ovmd -s cleanup
    # service ovmd enable-initial-config
    # shutdown -h now
    

    After cloning this VM or starting it, it will act as a first time boot VM and it will require configuration input through the VM API or on the virtual VM console.

    My next blog will go into detail on how to send messages through the Oracle VM API for remote configuration and also how to extend the scripts.

    Friday Jan 04, 2013

    dlmfs

    dlmfs is a really cool nifty feature as part of OCFS2. Basically, it's a virtual filesystem that allows a user/program to use the DLM through simple filesystem commands/manipulation. Without having to write programs that link with cluster libraries or do complex things, you can literally write a few lines of Python, Java or C code that let you create locks across a number of servers. We use this feature in Oracle VM to coordinate the master server and the locking of VMs across multiple nodes in a cluster. It allows us to make sure that a VM cannot start on multiple servers at once. Every VM is backed by a DLM lock, but by using dlmfs, this is simply a file in the dlmfs filesystem.

    To show you how easy and powerful this is, I took some of the Oracle VM agent Python code, this is a very simple example of how to create a lock domain, a lock and when you know you get the lock or not. The focus here is just a master lock which y ou could use for an agent that is responsible for a virtual IP or some executable that you want to locate on a given server but the calls to create any kind of lock are in the code. Anyone that wants to experiment with this can add their own bits in a matter of minutes.

    The prerequisite is simple : take a number of servers, configure an ocfs2 volume and an ocfs2 cluster (see previous blog entries) and run the script. You do not have to set up an ocfs2 volume if you do not want to, you could just set up the domain without actually mounting the filesystem. (See the global heartbeat blog). So practically this can be done with a very small simple setup.

    My example has two nodes, wcoekaer-emgc1 and wcoekaer-emgc2 are the two Oracle Linux 6 nodes, configured with a shared disk and an ocfs2 filesystem mounted. This setup ensures that the dlmfs kernel module is loaded and the cluster is online. Take the python code listed here and just execute it on both nodes.

    [root@wcoekaer-emgc2 ~]# lsmod |grep ocfs
    ocfs2                1092529  1 
    ocfs2_dlmfs            20160  1 
    ocfs2_stack_o2cb        4103  1 
    ocfs2_dlm             228380  1 ocfs2_stack_o2cb
    ocfs2_nodemanager     219951  12 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
    ocfs2_stackglue        11896  3 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb
    configfs               29244  2 ocfs2_nodemanager
    jbd2                   93114  2 ocfs2,ext4
    
    You see that the ocfs2_dlmfs kernel module is loaded.

    [root@wcoekaer-emgc2 ~]# mount |grep dlmfs
    ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
    
    The dlmfs virtual filesystem is mounted on /dlm.

    I now execute dlm.py on both nodes and show some output, after a while I kill (control-c) the script on the master node and you see the other node take over the lock. I then restart the dlm.py script and reboot the other node and you see the same.

    [root@wcoekaer-emgc1 ~]# ./dlm.py 
    Checking DLM
    DLM Ready - joining domain : mycluster
    Starting main loop...
    i am master of the multiverse
    i am master of the multiverse
    i am master of the multiverse
    i am master of the multiverse
    ^Ccleaned up master lock file
    [root@wcoekaer-emgc1 ~]# ./dlm.py 
    Checking DLM
    DLM Ready - joining domain : mycluster
    Starting main loop...
    i am not the master
    i am not the master
    i am not the master
    i am not the master
    i am master of the multiverse
    
    This shows that I started as master, then hit ctrl-c, I drop the lock, the other node takes the lock, then I reboot the other node and I take the lock again.

    [root@wcoekaer-emgc2 ~]# ./dlm.py
    Checking DLM
    DLM Ready - joining domain : mycluster
    Starting main loop...
    i am not the master
    i am not the master
    i am not the master
    i am not the master
    i am master of the multiverse
    i am master of the multiverse
    i am master of the multiverse
    ^Z
    [1]+  Stopped                 ./dlm.py
    [root@wcoekaer-emgc2 ~]# bg
    [1]+ ./dlm.py &
    [root@wcoekaer-emgc2 ~]# reboot -f
    
    Here you see that when this node started without being master, then at time of ctrl-c on the other node, became master, then after a forced reboot, the lock automatically gets released.

    And here is the code, just copy it to your servers and execute it...

    #!/usr/bin/python
    # Copyright (C) 2006-2012 Oracle. All rights reserved.
    #
    # This program is free software; you can redistribute it and/or modify it under
    # the terms of the GNU General Public License as published by the Free Software
    # Foundation, version 2.  This program is distributed in the hope that it will
    # be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
    # Public License for more details.  You should have received a copy of the GNU
    # General Public License along with this program; if not, write to the Free
    # Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
    # 021110-1307, USA.
    
    import sys
    import subprocess
    import stat
    import time
    import os
    import re
    import socket
    from time import sleep
    
    from os.path import join, isdir, exists
    
    # defines
    # dlmfs is where the dlmfs filesystem is mounted
    # the default, normal place for ocfs2 setups is /dlm
    # ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
    DLMFS = "/dlm"
    
    # we need a domain name which really is just a subdir in dlmfs
    # default to "mycluster" so then it creates /dlm/mycluster
    # locks are created inside this directory/domain
    DLM_DOMAIN_NAME = "mycluster"
    DLM_DOMAIN_PATH = DLMFS + "/" + DLM_DOMAIN_NAME
    
    # the main lock to use for being the owner of a lock
    # this can be any name, the filename is just the lockname
    DLM_LOCK_MASTER = DLM_DOMAIN_PATH + "/" + "master"
    
    # just a timeout
    SLEEP_ON_ERR = 60
    
    def run_cmd(cmd, success_return_code=(0,)):
        if not isinstance(cmd, list):
            raise Exception("Only accepts list!")
        cmd = [str(x) for x in cmd]
        proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
                                stderr=subprocess.PIPE, close_fds=True)
        (stdoutdata, stderrdata) = proc.communicate()
        if proc.returncode not in success_return_code:
            raise RuntimeError('Command: %s failed (%s): stderr: %s stdout: %s'
                               % (cmd, proc.returncode, stderrdata, stdoutdata))
        return str(stdoutdata)
    
    
    def dlm_ready():
        """
        Indicate if the DLM is ready of not.
    
        With dlmfs, the DLM is ready once the DLM filesystem is mounted
        under /dlm.
    
        @return: C{True} if the DLM is ready, C{False} otherwise.
        @rtype: C{bool}
    
        """
        return os.path.ismount(DLMFS)
    
    
    # just do a mkdir, if it already exists, we're good, if not just create it
    def dlm_join_domain(domain=DLM_DOMAIN_NAME):
        _dir = join(DLMFS, domain)
        if not isdir(_dir):
            os.mkdir(_dir)
        # else: already joined
    
    # leaving a domain is basically removing the directory.
    def dlm_leave_domain(domain=DLM_DOMAIN_NAME, force=True):
        _dir = join(DLMFS, domain)
        if force:
            cmd = ["rm", "-fr", _dir]
        else:
            cmd = ["rmdir", _dir]
        run_cmd(cmd)
    
    # acquire a lock
    def dlm_acquire_lock(lock):
        dlm_join_domain()
    
        # a lock is a filename in the domain directory
        lock_path = join(DLM_DOMAIN_PATH, lock)
    
        try:
            if not exists(lock_path):
                fd = os.open(lock_path, os.O_CREAT | os.O_NONBLOCK)
                os.close(fd)
            # create the EX lock
            # creating a file with O_RDWR causes an EX lock
            fd = os.open(lock_path, os.O_RDWR | os.O_NONBLOCK)
            # once the file is created in this mode, you can close it
            # and you still keep the lock
            os.close(fd)
        except Exception, e:
            if exists(lock_path):
                os.remove(lock_path)
            raise e
    
    
    def dlm_release_lock(lock):
        # releasing a lock is as easy as just removing the file
        lock_path = join(DLM_DOMAIN_PATH, lock)
        if exists(lock_path):
            os.remove(lock_path)
    
    def acquire_master_dlm_lock():
        ETXTBUSY = 26
    
        dlm_join_domain()
    
        # close() does not downconvert the lock level nor does it drop the lock. The
        # holder still owns the lock at that level after close.
        # close() allows any downconvert request to succeed.
        # However, a downconvert request is only generated for queued requests. And
        # O_NONBLOCK is specifically a noqueue dlm request.
    
        # 1) O_CREAT | O_NONBLOCK will create a lock file if it does not exist, whether
        #    we are the lock holder or not.
        # 2) if we hold O_RDWR lock, and we close but not delete it, we still hold it.
        #    afterward, O_RDWR will succeed, but O_RDWR | O_NONBLOCK will not.
        # 3) but if we donnot hold the lock, O_RDWR will hang there waiting,
        #    which is not desirable -- any uninterruptable hang is undesirable.
        # 4) if noboday else hold the lock either, but the lock file exists as side effect
        #    of 1), with O_NONBLOCK, it may result in ETXTBUSY
    
        # a) we need O_NONBLOCK to avoid scenario (3)
        # b) we need to delete it ourself to avoid (2)
        #   *) if we do not succeed with (1), remove the lock file to avoid (4)
        #   *) if everything is good, we drop it and we remove it
        #   *) if killed by a program, this program should remove the file
        #   *) if crashed, but not rebooted, something needs to remove the file
        #   *) on reboot/reset the lock is released to the other node(s)
    
        try:
            if not exists(DLM_LOCK_MASTER):
                fd = os.open(DLM_LOCK_MASTER, os.O_CREAT | os.O_NONBLOCK)
                os.close(fd)
    
            master_lock = os.open(DLM_LOCK_MASTER, os.O_RDWR | os.O_NONBLOCK)
    
            os.close(master_lock)
            print "i am master of the multiverse"
            # at this point, I know I am the master and I can add code to do
            # things that only a master can do, such as, consider setting
            # a virtual IP or, if I am master, I start a program
            # and if not, then I make sure I don't run that program (or VIP)
            # so the magic starts here...
            return True
    
        except OSError, e:
            if e.errno == ETXTBUSY:
                print "i am not the master"
                # if we are not master and the file exists, remove it or
                # we will never succeed
                if exists(DLM_LOCK_MASTER):
                    os.remove(DLM_LOCK_MASTER)
            else:
                raise e
    
    
    def release_master_dlm_lock():
        if exists(DLM_LOCK_MASTER):
            os.remove(DLM_LOCK_MASTER)
    
    
    def run_forever():
        # set socket default timeout for all connections
        print "Checking DLM"
        if dlm_ready():
           print "DLM Ready - joining domain : " + DLM_DOMAIN_NAME
        else:
           print "DLM not ready - bailing, fix the cluster stack"
           sys.exit(1)
    
        dlm_join_domain()
    
        print "Starting main loop..."
    
        socket.setdefaulttimeout(30)
        while True:
            try:
                acquire_master_dlm_lock()
                sleep(20)
            except Exception, e:
                sleep(SLEEP_ON_ERR)
            except (KeyboardInterrupt, SystemExit):
                if exists(DLM_LOCK_MASTER):
                   os.remove(DLM_LOCK_MASTER)
                # if you control-c out of this, then you lose the lock!
                # delete it on exit for release
                print "cleaned up master lock file"
                sys.exit(1)
    
    
    if __name__ == '__main__':
        run_forever()
    

    Thursday Jan 03, 2013

    OCFS2 global heartbeat

    A cool, but often missed feature in Oracle Linux is the inclusion of OCFS2. OCFS2 is a native Linux clusterfilesystem which was written many years ago at Oracle (hence the name Oracle Cluster Filesystem) and which got included in the mainline Linux kernel around 2.6.16 somewhere back in 2005. The filesystem is widely used and has a number of really cool features.

  • simplicity : it's incredibly easy to configure the filesystem and clusterstack. There is literally one small text-based config file.
  • complete : ocfs2 contains all the components needed : a nodemanager, a heartbeat, a distributed lock manager and the actual cluster filesystem
  • small : the size of the filesystem and the needed tools is incredibly small. It consists of a few kernel modules and a small set of userspace tools. All the kernel modules together add up to about 2.5Mb in size and the userspace package is a mere 800Kb.
  • integrated : it's a native Linux filesystem so it makes use of all the normal kernel infrastructure. There is no duplication of structures caches, it fits right into the standard Linux filesystem structure.
  • part of Oracle Linux/UEK : ocfs2, like other linux filesystems, is built as kernel modules. When customers use Oracle Linux's UEK or UEK2, we automatically compile the kernel modules for the filesystem. Other distributions like SLES have done the same. We fully support OCFS2 as part of Oracle Linux as a general purpose cluster filesystem.
  • feature rich :
    OCFS2 is POSIX-compliant
    Optimized Allocations (extents, reservations, sparse, unwritten extents, punch holes)
    REFLINKs (inode-based writeable snapshots)
    Indexed Directories
    Metadata Checksums
    Extended Attributes (unlimited number of attributes per inode)
    Advanced Security (POSIX ACLs and SELinux)
    User and Group Quotas
    Variable Block and Cluster sizes
    Journaling (Ordered and Writeback data journaling modes)
    Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64) - yes, you can mount the filesystem in a heterogeneous cluster.
    Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os
    In-built Clusterstack with a Distributed Lock Manager
    Cluster-aware Tools (mkfs, fsck, tunefs, etc.)
  • One of the main features added most recently is Global Heartbeat. OCFS2 as a filesystem typically was used with what's called local heartbeat. Basically for every filesystem you mounted, it would start its own local heartbeat, membership mechanism. The disk heartbeat means a disk io every 1 or 2 seconds for every node in the cluster, for every device. It was never a problem when the number of mounted volumes was relatively small but once customers were using 20+ volumes the overhead of the multiple disk heartbeats became significant and at times became a stability issue.

    global heartbeat was written to provide a solution to the multiple heartbeats. It is now possible to specify on which device(s) you want a heartbeat thread and then you can mount many other volumes that do not have their own and the heartbeat is shared amongst those one, or few threads and as such significantly reducing disk IO overhead.

    I was playing with this a little bit the other day and noticed that this wasn't very well documented so why not write it up here and share it with everyone. Getting started with OCFS2 is just really easy and withing just a few minutes it is possible to have a complete installation.

    I started with two servers installed with Oracle Linux 6.3. Each server has 2 network interfaces, one public and one private. The servers have a local disk and a shared storage device. For cluster filesystems, typically this shared storage device should be either a shared SAN disk or an iscsi device but it is also possible with Oracle Linux and UEK2 to create a shared virtual device on an nfs server and use this device for the cluster filesystem. This technique is used with Oracle VM where the shared storage is NAS-based.I just wrote a blog entry about how to do that here.

    While it is technically possible to create a working ocfs2 configuration using just one network and a single IP per server, it is certainly not ideal and not a recommended configuration for real world use. In any cluster environment it's highly recommended to have a private network for cluster traffic.The biggest reason for instability in a clustering environment is a bad/unreliable network and/or storage. Many times the environment has an overloaded network which causes network heartbeats to fail or disks where failover takes longer than the default configuration and the only alternative we have at that point, is to reboot the node(s).

    Typically when I do a test like this, I make sure I use the latest versions of the OS release. So after an installation of Oracle Linux 6.3, I just do a yum update on all my nodes to have the latest packages and also latest kernel version installed and then do a reboot. That gets me to 2.6.39-300.17.3.el6uek.x86_64 at the time of writing. Of course all this is freely accessibly from http://public-yum.oracle.com.

    Depending on the type of installation you did (basic, minimal, etc...) you may or may not have to add RPMs. Do a simple check rpm -q ocfs2-tools to see if the tools are installed, if not, just run yum install ocfs2-tools. And that's it. All required software is now installed. The kernel modules are already part of the uek2 kernel and the required tools (mkfs, fsck, o2cb,..) are part of the ocfs2-tools RPM.

    Next up: create the filesystem on the shared disk device and configure the cluster.

    One requirement for using global heartbeat is that the heartbeat device needs to be a NON-partitioned disk. Other OCFS2 volumes you want to create and mount can be on partitioned disks, but a device for the heartbeat needs to be on an empty disk. Let's assume /dev/sdb in this example.

    # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol1 \
    --cluster-name=ocfs2 --cluster-stack=o2cb --global-heartbeat /dev/sdb
    This creates a filesystem with a 4K blocksize (normal value), clustersize of 4K (if you have many small files, this is a good value, if you have few large files, go to 1M).

    Journalsize of 4M if you have a large filesystem with a lot of metadata changes you might want to increase this. I did not add an option for 32bit or 64bit journals. if you want to create huge filesystems then use block64 which uses jbd2.

    The filesystem is created for 4 nodes (-N 4) this can be modified if your cluster needs to grow larger so you can always tune this with tunefs.ocfs2.

    Label ocfs2vol1, this is a disklabel you can later use to mount by label a filesystem.

    clustername=ocfs2, this is the default name but if you want to have your own name for your cluster you can put a different value here, remember it because you will need to configure the clusterstack with the clustername later.

    cluster-stack=o2cb : it is possible to have different cluster-stacks used such as pacemaker or cman.

    global-heartbeat : make sure that the filesystem is prepared and built to support global heartbeat

    /dev/sdb : the device to use for the filesystem.

    
    # mkfs.ocfs2 -b 4K -C 4K -J size=4M -N 4 -L ocfs2vol1 --cluster-name=ocfs2 \
    --cluster-stack=o2cb --force --global-heartbeat /dev/sdb
    mkfs.ocfs2 1.8.0
    Cluster stack: o2cb
    Cluster name: ocfs2
    Stack Flags: 0x1
    NOTE: Feature extended slot map may be enabled
    Overwriting existing ocfs2 partition.
    WARNING: Cluster check disabled.
    Proceed (y/N): y
    Label: ocfs2vol1
    Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg
    Block size: 4096 (12 bits)
    Cluster size: 4096 (12 bits)
    Volume size: 10725765120 (2618595 clusters) (2618595 blocks)
    Cluster groups: 82 (tail covers 5859 clusters, rest cover 32256 clusters)
    Extent allocator size: 4194304 (1 groups)
    Journal size: 4194304
    Node slots: 4
    Creating bitmaps: done
    Initializing superblock: done
    Writing system files: done
    Writing superblock: done
    Writing backup superblock: 2 block(s)
    Formatting Journals: done
    Growing extent allocator: done
    Formatting slot map: done
    Formatting quota files: done
    Writing lost+found: done
    mkfs.ocfs2 successful
    

    Now, we just have to configure the o2cb stack and we're done.

  • add the cluster : o2cb add-cluster ocfs2
  • add the nodes :
    o2cb add-node --ip 192.168.199.1 --number 0 ocfs2 host1
    o2cb add-node --ip 192.168.199.2 --number 1 ocfs2 host2
  • it is very important to use the hostname of the server (the name you get when typing hostname) for each node!
  • add the heartbeat device:
    run the following command and take the UUID value of the filesystem/device you want to use for heartbeat mounted.ocfs2 -d
    # mounted.ocfs2 -d
    Device      Stack  Cluster  F  UUID                              Label
    /dev/sdb   o2cb   ocfs2    G  244A6AAAE77F4053803734530FC4E0B7  ocfs2vol1
    
    o2cb add-heartbeat ocfs2 244A6AAAE77F4053803734530FC4E0B7
  • enable global heartbeat o2cb heartbeat-mode ocfs2 global
  • start the clusterstack : /etc/init.d/o2cb enable
  • verify that the stack is up and running : o2cb cluster-status
  • That's it. If you want to enable this at boot time, you can configure o2cb to start automatically by running /etc/init.d/o2cb configure. This allows you to set different heartbeat time out values and also whether or not to start the clusterstack at boot time.

    Now that a first node is configured, all you have to do is copy the file /etc/ocfs2/cluster.conf to all the other nodes in your cluster. You do not have to edit it on the other nodes, you just need to have an exact copy everywhere. You also do not need to redo the above commands, except 1) make sure ocfs2-tools is installed everywhere and if you want to start at boot time, re-run the /etc/init.d/o2cb configure on the other nodes as well. From here on, you can just mount your filesystems :

    mount /dev/sdb /mountpoint1 on each node.

    If you create more OCFS2 volumes you can just keep mounting them all, and with global heartbeat, you will just have one (or a few) hb's going on.

    have fun...

    Here is vmstat output, the first output shows a single heartbeat and 8 mounted filesystems, the second vmstat output shows 8 mounted filesystems with their own local heartbeat. Even though the IO amount is low, it shows that there are about 8x more IOs happening (from 1 every other second to 4 every second). As these are small IOs, they will move the diskhead to a specific place all the time and interrupt performance if you have it on each device. Hopefully this shows the benefits of global heartbeat.

    # vmstat 1
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     0  0      0 789752  26220  97620    0    0     1     0   41   34  0  0 100  0  0
     0  0      0 789752  26220  97620    0    0     0     0   46   22  0  0 100  0  0
     0  0      0 789752  26220  97620    0    0     1     1   38   29  0  0 100  0  0
     0  0      0 789752  26228  97620    0    0     0    52   52   41  0  0 100  1  0
     0  0      0 789752  26228  97620    0    0     1     0   28   26  0  0 100  0  0
     0  0      0 789760  26228  97620    0    0     0     0   30   30  0  0 100  0  0
     0  0      0 789760  26228  97620    0    0     1     1   26   20  0  0 100  0  0
     0  0      0 789760  26228  97620    0    0     0     0   54   37  0  1 100  0  0
     0  0      0 789760  26228  97620    0    0     1     0   29   28  0  0 100  0  0
     0  0      0 789760  26236  97612    0    0     0    16   43   48  0  0 100  0  0
     0  0      0 789760  26236  97620    0    0     1     1   48   28  0  0 100  0  0
     0  0      0 789760  26236  97620    0    0     0     0   42   30  0  0 100  0  0
     0  0      0 789760  26236  97620    0    0     1     0   26   30  0  0 100  0  0
     0  0      0 789760  26236  97620    0    0     0     0   35   24  0  0 100  0  0
     0  1      0 789760  26240  97616    0    0     1    21   29   27  0  0 100  0  0
     0  0      0 789760  26244  97620    0    0     0     4   51   44  0  0 100  0  0
     0  0      0 789760  26244  97620    0    0     1     0   31   24  0  0 100  0  0
     0  0      0 789760  26244  97620    0    0     0     0   25   28  0  0 100  0  0
     0  0      0 789760  26244  97620    0    0     1     1   30   20  0  0 100  0  0
     0  0      0 789760  26244  97620    0    0     0     0   41   30  0  0 100  0  0
     0  0      0 789760  26252  97616    0    0     1    16   56   44  0  0 100  0  0
    
    
    
    # vmstat 1
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     0  0      0 784364  28732  98620    0    0     4    46   54   64  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   60   48  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   51   53  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   58   50  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   56   44  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   46   47  0  0 100  0  0
     0  0      0 784364  28732  98628    0    0     4     2   65   54  0  0 100  0  0
     0  0      0 784388  28740  98620    0    0     4    14   65   55  0  0 100  0  0
     0  0      0 784388  28740  98628    0    0     4     2   46   48  0  0 100  0  0
     0  0      0 784388  28740  98628    0    0     4     2   52   42  0  0 100  0  0
     0  0      0 784388  28740  98628    0    0     4     2   51   58  0  0 100  0  0
     0  0      0 784388  28740  98628    0    0     4     2   36   43  0  0 100  0  0
     0  0      0 784396  28740  98628    0    0     4     2   39   47  0  0 100  0  0
     0  0      0 784396  28740  98628    0    0     4     2   52   54  0  0 100  0  0
     0  0      0 784396  28740  98628    0    0     4     2   42   48  0  0 100  0  0
     0  0      0 784404  28748  98620    0    0     4    14   52   63  0  0 100  0  0
     0  0      0 784404  28748  98628    0    0     4     2   32   42  0  0 100  0  0
     0  0      0 784404  28748  98628    0    0     4     2   50   40  0  0 100  0  0
     0  0      0 784404  28748  98628    0    0     4     2   58   56  0  0 100  0  0
     0  0      0 784412  28748  98628    0    0     4     2   39   46  0  0 100  0  0
     0  0      0 784412  28748  98628    0    0     4     2   45   50  0  0 100  0  0
     0  0      0 784412  28748  98628    0    0     4     2   43   42  0  0 100  0  0
     0  0      0 784288  28748  98628    0    0     4     6   48   52  0  0 100  0  0
    
    

    dm nfs

    A little known feature that we make good use of in Oracle VM is called dm nfs. Basically the ability to create a device mapper device directly on an nfs-based file/filesystem. We use this in Oracle VM 3 if your shared storage for the cluster is nfs based.

    Oracle VM clustering relies on the OCFS2 clusterstack/filesystem that is native in the kernel (uek2/2.6.39-x). When we create an HA-enabled pool, we create, what we call, a pool filesystem. That filesystem contains an ocfs2 volume so that we can store cluster-wide data. In particular we store shared database files that are needed by the Oracle VM agents on the nodes for HA. It contains info on pool membership, which VMs are in HA mode, what the pool IP is etc...

    When the user provides an nfs filesystem for the pool, we do the following :

  • mount the nfs volume in /nfsmnt/
  • create a 10GB sized file ovspoolfs.img
  • create a dm nfs volume(/dev/mapper/ovspoolfs> on this ovspoolfs.img file
  • create an ocfs2 volume on this dm nfs device
  • mount the ocfs2 volume on /poolfsmnt/
  • If someone wants to try out something that relies on block-based shared storage devices, such as ocfs2, but does not have iSCSI or SAN storage, using nfs is an alternative and dm nfs just makes it really easy.

    To do this yourself, the following commands will do it for you :

  • to find out if any such devices exist just type dmsetup table --target nfs
  • to create your own device, do something like this:
  • mount mynfsserver:/mountpoint /mnt
    dd if=/dev/zero of=/mnt/myvolume.img bs=1M count=2000 
    dmsetup create myvolume --table "0 4096000 nfs /mnt/myvolume.img 0"
    
    So mount the nfs volume, create a file which will be the container of the blockdevice, in this case a 2GB file and then create the dm device. The values for the dmsetup command are the following:

    myvolume = the name of the /dev/mapper device. Here we end up with /dev/mapper/myvolume

    table = start (normally always 0), number of blocks/length, this is in 512byte blocks, so you double the number,nfs since this is on nfs, filename of the nfs based file, offset (normally always 0)

    So now you have /dev/mapper/myvolume, it acts like a normal block device. If you do this on multiple servers, you can actually create an ocfs2 filesystem on this block device and it will be consistent across the servers.

    Credits go to Chuck Lever for writing dm nfs in the first place, thanks Chuck :) The code for dm nfs is here.

    About

    Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

    You can follow him on Twitter at @wimcoekaerts

    Search

    Categories
    Archives
    « January 2013 »
    SunMonTueWedThuFriSat
      
    1
    2
    7
    8
    9
    10
    11
    12
    13
    14
    15
    17
    18
    19
    20
    23
    24
    25
    26
    27
    28
    29
    30
    31
      
           
    Today