« March 2009 | Main | June 2009 »

May 2009 Archives

May 4, 2009

OCFS2 reflink

It has been a while since I last wrote something about OCFS2. For those that don’t know what this is, OCFS2 is a feature-rich standard Linux cluster filesystem. Linus took OCFS2 into mainline in the 2.6.16 time frame and it is being actively maintained. The majority of the work has always been done at Oracle however folks from Novell have provided many contributions, as well as individuals like Christopher Hellwig.

OCFS2 is a really nice filesystem that is used by many people out there, if we track the ocfs2-users and ocfs2-devel mail lists it is clear that many people out there make use of it for their own applications.

We provide OCFS2 RPMs for Oracle Enterprise Linux(OEL) and Red Hat Enterprise Linux(RHEL) on oracle.com and we make the RPMs available integrated on ULN for the Oracle Unbreakable Linux support customers. Even though the code is in the 2.6.18 kernel that is used in RHEL5, they decided to not compile the modules so we compile them out of the kernel. (we do not modify the kernel config and build them in because that would be considered a change)

For people that want to use OCFS2 or play with it, it’s included in OEL as an extra (not modifications of existing RHEL code). You can get the RPMS for RHEL from the Oracle Technology Network (OTN). It is all free for use and download. If you need support you can purchase an Unbreakable Linux support subscription and that includes support for the filesystem.

You can find tons of information on our oss.oracle.com website http://oss.oracle.com/projects/ocfs2/. Some of the new features that are in the mainline Linux release of the filesystem are listed below. The most notable one is REFLINK which I will cover in more detail. All OCFS2 development is public and every change is immediately published on oss in our git repositories.

- extended attributes. in fact the value of each extended attribute can be as large as a regular file. Which is larger than even ext3 can do.

- Posix ACL support

- support for userspace cluster stacks. If needed it is possible to use OCFS2 with cman and pacemaker

- jbd2 support. This gives us 64-bit blocknumbers and we can theoretically support 4PB filesystems. with jbd1 the limit is/was 16TB per filesystem

- quota support

- metadata checksums and ecc. all metadatablocks in OCFS2 now have a checksum field. If the checksum fails, there is an ECC field that can recover a single bit error. If it is unrecoverable then OCFS2 will make this single inode unreadable but it does not or will not affect the rest of the filesystem. In most filesystems this would take the entire filesystem into read-only mode.

- improved inode allocation. This will help with filesystems which a huge huge number of files.

- indexed directories. This will improve performance of lookups of a single name.

- reflink which creates a target inode that shares the data extents of the source inode in a copy-on-write fashion.


Now, about reflink. The reason we implemented reflink is for Oracle VM. As you know, a virtual machine/guest owns one or more virtual disks. These virtual disks are represented as files on a filesystem hosted by the hypervisor. In the case of Oracle VM, if you have SAN or iSCSI storage, we put an OCFS2 filesystem on top of this, managed by the management domain (dom0). The virtual disks live on top of this OCFS2 volume.

These virtual disks can become very large, they usually are many GB’s in size. So when a user wants to create a clone of a virtual machine or create a virtual machine based on an existing template, we copy the content of the original virtual disks to a new set of virtual disks. By default this duplicates the amount of storage used.

ie. you have VM1 with a 40gb virtual disk (vm1/system.img) and you want to copy that to create VM2 based on the same virtual disk image (vm2/system.img).

The reflink feature in OCFS2 which was published to fs-devel and ocfs2-devel a while back, supports this operation through effectively creating hard links but with copy-on-write (or basically a point-in-time data hard link).

Today, we copy the file vm1/system.img to vm2/system.img. Tomorrow, we do reflink vm1/system.img vm2/system.img. At initial create time no additional space is used, no actual copying is done, it just creates a totally new inode/file and shares the data extents. As soon as a write is done to one or the other side, 1mb chunks are copied over where the writes occur.

This allows us to create instant copies of files (or in the case of Oracle VM, virtual disk images).

Some of the advantages of reflink are :

- Each “hard link” or point-in-time copy, is a regular file for the OS, for an application etc, so there are no changes needed to applications or backup software. This is totally transparent, there is no container around these files etc. Unlike vmdk and vhd where the snapshots live inside the containers.

- It is fully cluster safe so this works in an OCFS2 filesystem cluster so the link and the COW works on any node even if the file is used and opened on another node. This allows us in the Oracle VM case to create snapshots and run these new VMs on a different node than the original VM is running.

- This is a generic feature just like symlink. It is available to any user or application.

- It is open source (part of OCFS2 code) free to use for anyone.


Below is an example of reflink. It shows the diskspace usage, it shows the time it takes to complete the commands and also a simple modification done with dd to one file and show how that affects both files.

ls -l total 1771896 -rw-r--r-- 1 root root 1814420898 May 1 12:58 el4.5-system.img ===============================================================

df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sde1 50G 3.9G 47G 8% /ocfs2
===============================================================

reflink el4.5-system.img el4.5-system1.img

real 0m0.030s
user 0m0.000s
sys 0m0.000s
===============================================================
ls -l
total 1771896
-rw-r--r-- 1 root root 1814420898 May 1 12:59 el4.5-system1.img
-rw-r--r-- 1 root root 1814420898 May 1 12:58 el4.5-system.img
===============================================================
df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sde1 50G 3.9G 47G 8% /ocfs2
===============================================================
md5sum el4.5-system.img el4.5-system1.img
c41b670c59e8a4446ad07e9fb0f98b6d el4.5-system.img

real 0m31.094s
user 0m7.420s
sys 0m10.530s
c41b670c59e8a4446ad07e9fb0f98b6d el4.5-system1.img

real 0m34.553s
user 0m7.500s
sys 0m10.140s

===============================================================
dd if=/dev/zero of=el4.5-system1.img bs=1M count=1000 seek=500 conv=notrunc
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 104.889 seconds, 10.0 MB/s
===============================================================
ls -l
total 3543792
-rw-r--r-- 1 root root 1814420898 May 1 13:02 el4.5-system1.img
-rw-r--r-- 1 root root 1814420898 May 1 12:58 el4.5-system.img
===============================================================
df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sde1 50G 4.9G 46G 10% /ocfs2
===============================================================
md5sum el4.5-system.img el4.5-system1.img
c41b670c59e8a4446ad07e9fb0f98b6d el4.5-system.img

real 0m32.430s
user 0m7.920s
sys 0m11.340s
b67b39c3c86a4110cb795f516bc7f86b el4.5-system1.img

real 0m32.069s
user 0m7.920s
sys 0m10.350s

enjoy.

May 6, 2009

Oracle VM Manager CLI and Web Services API

For the last several months we have been working on a web services api (wsdl) and a command line interface (cli) for Oracle VM Manager. The cli uses the web services interface and is written in python so it can run on any platform where python is installed. The API exposes all the interfaces that the Oracle VM Manager UI components call, such as : manage server pools, servers, virtual machines, templates,...

It is now very easy to manage your Oracle VM server pools and virtual machines from a shell prompt. I was playing with this yesterday and figured I would take this opportunity to post an example.

In this example below, I had installed Oracle VM Manager (but not yet logged into it) and installed the cli scripts on one system and then I had installed Oracle VM server on another machine. On the Oracle VM server I had actually locally downloaded (manually) an Oracle VM template and created my own virtual machine without using the manager at all. So I actually create a new serverpool in Oracle VM Manager using the cli shell. I register my existing virtual machine and at the end show the list of commands we expose through the cli and through the webservices API. I think this is going to be very useful for many Oracle VM users.

configure the location of the Manager instance

wcoekaer@aldebaran-pc ~]$ ovm config
This is a wizard to help you start running the Oracle VM Command Line Manager.
Ctrl-C to exit.
Enter the host to connect:aldebaran-pc
Enter the port to connect:8888
Enter the deploy path (blank to default):
Enter the path of the vncviewer (blank to skip):
Would you like to enable WS-Security support? (Y/n)n
Configuration finish.
Please run the Oracle VM Command Line Manager again.

Starting up the cli in shell mode

[wcoekaer@aldebaran-pc ~]$ ovm -u admin shell
Enter Login Password:
Welcome to the Oracle VM Manager Shell. Type "help" for a list of commands.

I want to create a server pool but forgot the syntax

ovm> help serverpool_create
usage: ovm serverpool_create [options]
options:
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
(Required) Server Host/IP
-s SERVERPOOL_NAME, --serverpool_name=SERVERPOOL_NAME
(Required) Server Pool Name
-a, --ha_enabled Enable High Availability
-A AGENT_PASSWORD, --agent_password=AGENT_PASSWORD
Agent Password
-U UTILITY_USERNAME, --utility_username=UTILITY_USERNAME
(Required) Utility Server Username
-P UTILITY_PASSWORD, --utility_password=UTILITY_PASSWORD
Utility Server Password
-L SERVER_LOCATION, --server_location=SERVER_LOCATION
Server Location
-D DESCRIPTION, --description=DESCRIPTION
Description

create my server pool, the Oracle VM server name is wcoekaer-srv4

ovm>serverpool_create -s mypool -A ****** -H wcoekaer-srv4 -U root -P ****** -L myoffice -D something
ServerPool "mypool" has been created

you can see it shows as the only pool (mypool)

ovm> serverpool_list
Server Pool Name Status HA
mypool Active Disabled

there are no images imported yet so lets import the virtual machine I had already created

ovm> image_list
Name Size(MB) ServerPoolName Status CreationTime


ovm> image_register -s mypool -n dom1 -u root -p ******* -c ****** -o "Oracle Enterprise Linux 5" -d mydom1
Registering, please check the status.

as you can see, it now shows up

ovm> image_list
Name Size(MB) ServerPoolName Status CreationTime
dom1 6229.0 mypool Pending 2009-05-05

but ! need to approve it of course

ovm> image_approve -s mypool -n dom1
VM Image "dom1" has been successfully approved.

and here it is, it shows that it's stil up and running because I did not shut down the virtual machine, no need to

ovm> vm_list
Name ImageSize Mem VCPUs Status ServerPoolName
dom1 6229.0 256 1 Running mypool

a list of all the options

ovm> help
Usage: ovm [options] subcommand [suboptions]
Oracle VM Command Line Manager.
ovm full list of subcommands:
agent_version --- Get an agent version
config --- Start a configuration wizard
group_create --- Create a User Group.
group_list --- List of all the groups.
help --- Show help
image_approve --- Approve a VM Image
image_del --- Delete a VM image
image_discover --- List all of the Discoverable VM images
image_import --- Import an Image from an External Source
image_list --- Get a list of VM images
image_register --- Register a Discoverable VM image
image_status --- Show the Image status
iso_approve --- Approve an ISO image
iso_del --- Delete an ISO image
iso_discover --- List all of the Discoverable ISOs
iso_import --- Import an ISO from External Source
iso_list --- Get a list of ISO images
iso_register --- Register a Discoverable ISO image
iso_status --- Show the ISO status
os_list --- List all the available Operating Systems
server_add --- Add a Server to the ServerPool
server_config --- Config a Virtual Server
server_del --- Delete a Server from the ServerPool
server_info --- Get a VM Server info
server_list --- Get a list of VM Servers
server_poweroff --- Poweroff a VM Server
server_restart --- Reboot a VM Server
server_status --- Show the server status
serverpool_config --- Config a ServerPool
serverpool_create --- Create a ServerPool
serverpool_del --- Delete a ServerPool
serverpool_info --- Get a ServerPool info
serverpool_list --- Get a list of ServerPools
serverpool_refresh --- Refresh all of the ServerPools
serverpool_restore --- Restore a ServerPool
serverpool_status --- Get a ServerPool status
shareddisk_create --- Create and Register a Shared Virtual Disk
shareddisk_del --- Delete a Shared Virtual Disk
shareddisk_list --- Get a list of Shared Virtual Disks
shell --- Launch an interactive shell
template_approve --- Approve a Template
template_del --- Delete a Template
template_discover --- List all of the Discoverable Templates
template_import --- Import a Template from an External Source
template_list --- Get a list of Templates
template_register --- Register a Discoverable Template
template_status --- Show the template status
use --- Sepcify a ServerPool to use
user_assign_group --- Assign a user to the Group.
user_assign_serverpool --- Assign a user to the ServerPool.
user_create --- Create a User Account.
user_list --- List of all the users.
vm_add_disk --- Create and Add a disk to the VM
vm_add_nic --- Create and Add a nic to the VM
vm_as_template --- Save a VirtualMachine as template
vm_attach_cdrom --- Attach a CDROM to the VM
vm_attach_shareddisk --- Attach a Shared Virtual Disk to the VM
vm_clone --- Clone a VirtualMachine
vm_config --- Config a VirtualMachine
vm_create --- Create a VM
vm_del --- Delete a VirtualMachine
vm_del_disk --- Remove a disk from the VM
vm_del_nic --- Remove a nic from the VM
vm_deploy --- Deploy a VirtualMachine
vm_detach_cdrom --- Detach CDROMs from the VM
vm_detach_shareddisk --- Detach a Shared Virtual Disk from the VM
vm_info --- Get a VM info
vm_list --- Get a list of VMs
vm_list_cdrom --- List CDROMs of the VM
vm_list_disk --- List Disks of the VM
vm_list_nic --- List Virtual Network Interfaces of the VM
vm_migrate --- Live Migration
vm_migrate_all --- Migrate all the VMs on the server
vm_pause --- Pause a VirtualMachine
vm_poweroff --- Poweroff a VirtualMachine
vm_poweron --- PowerOn a VirtualMachine
vm_reboot --- Restart a VirtualMachine
vm_reset_status --- Reset status of a VirtualMachine
vm_resume --- Resume a VirtualMachine
vm_set_bootdevice --- Set the first BootDevice
vm_set_keyboardlayout --- Set the Keyboard Layout
vm_set_vnc_pwd --- Set the VNC Console Password
vm_status --- Get a VM status
vm_suspend --- Suspend a VirtualMachine
vm_unpause --- Unpause a VirtualMachine
vncviewer --- Start a VNC console.

About May 2009

This page contains all entries posted to Wim Coekaerts Blog in May 2009. They are listed from oldest to newest.

March 2009 is the previous archive.

June 2009 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle