What's new in Oracle Linux Part 1: OCFS2 1.6 REFLINKs

This is the first a multi-part series on what is new and useful in Oracle Linux 5.5. 

Part 1 covers the OCFS2 1.6 file system.  If you are new to OCFS2, here is a very brief primer.

OCFS2 is a file system, just like ext3 or NTFS.  OCFS2 was created by Oracle and donated to the open source community, and is now part of the mainline Linux kernel.  OCFS2 has several attributes that make it an attractive option for many different applications.

First, OCFS2 is a shared disk cluster file system.  That means that OCFS2 volumes can be mounted on multiple servers simultaneously, and the file system keeps track of all the changes for you.  OCFS2 is also POSIX compliant, which means that it will work well with just about any program that runs under Linux.  The files stored in the file system are also architecture-independent, so you can mount an OCFS2 volume on an x86, x86_64, ia64 or ppc64 based Linux server, regardless of whether they use the same data representations (big/little endian).  OCFS2 is also journaled, ensuring that the metadata is always consistent. 

Those may all sound very technical, but they add up to a very stable, and high performance file system.  Version 1.6 adds some highly anticipated features, including:
  • Extended attributes
  • POSIX access control lists
  • Indexed directories
  • Metadata checksums
  • User and Group quotas
  • Allocation reservation
  • JBD2 support
  • Discontiguous block support
The most anticipated of these features is the REFLINK support, so that will be the focus of this article.


REFLINKs provide the same snapshotting ability found in expensive data backup solutions and virtualization vendor offerings, without the complexity or proprietary extensions.  Instead of exposing an API, REFLINKs use a simple command:  reflink.  It works just like a hard link in Linux, although the implementation is obviously much more capable. 

REFLINKs enables a user to create multiple writeable snapshots of a regular file. In this operation, the file system creates a new inode with the same extent pointers as the original inode. Multiple inodes are thus able to share data extents. This feature works across the cluster, but not across other file system types (you can't make a REFLINK to a file on an OCFS2 volume from a directory on an ext3 volume).

Because the inodes share the same extent pointers, creating a REFLINK snapshot takes up very little space.  Note:  the 'du' utility will show both the original file and the REFLINK as taking up the same amount of space.  This is because the 'du' utility does not know about REFLINKs.  Yet.

Here is an example of how to create a REFLINK.  We will start with a 1G file named test, create a REFLINK called test.1, then change both files.  Notice the output of the du and df commands.  The df command shows no increase in actual disk usage.

[john@localhost REFLINKs]$ du -sh *
1000M   test

[john@localhost REFLINKs]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             8.0G  4.0G  4.1G  49% /var/data

[john@localhost REFLINKs]$ reflink test test.1

[john@localhost REFLINKs]$ du -sh *
1000M   test
1000M   test.1

[john@localhost REFLINKs]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             8.0G  4.0G  4.1G  49% /var/data

[john@localhost REFLINKs]$ echo "A" >> test

[john@localhost REFLINKs]$ echo "B" >> test.1

[john@localhost REFLINKs]$ du -sh *
1001M   test
1001M   test.1

[john@localhost REFLINKs]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             8.0G  4.0G  4.1G  49% /var/data

Even adding a dozen more REFLINKs will not budge the results of the df command.  Essentially, a REFLINK'd file uses a copy-on-write mechanism, so the space only grows when new data is added/removed from one of the linked snapshots.

Why is this useful?

There are a number of reasons to use snapshots.  Any environment that uses large files, and variations of those large files, will benefit.  For example:
  • Test configurations where large input files vary only slightly.  Create the first input file, create a REFLINK to a new file, and make the modifications to the linked snapshot instead of doing a copy command.
  • Backups for large files that change infrequently, or incur only small changes.  Run the reflink command to get a hot backup of the file, and store it using the current date.  Example:  reflink data.dbf data.dbf.11802010.
  • Quick and dirty version control.  Need to keep an original copy of a file before you start mucking about?  reflink test.conf test.conf.original
  • Incorporate the functionality into your own product.  ISV's can create applications using Oracle Linux, and get snapshotting capability for free.  Use it to create robust snapshotting facilities for your custom applications.
For more information, check out the OCFS2 project page on oss.oracle.com.

Post a Comment:
  • HTML Syntax: NOT allowed



« February 2016