Friday Jan 04, 2013


dlmfs is a really cool nifty feature as part of OCFS2. Basically, it's a virtual filesystem that allows a user/program to use the DLM through simple filesystem commands/manipulation. Without having to write programs that link with cluster libraries or do complex things, you can literally write a few lines of Python, Java or C code that let you create locks across a number of servers. We use this feature in Oracle VM to coordinate the master server and the locking of VMs across multiple nodes in a cluster. It allows us to make sure that a VM cannot start on multiple servers at once. Every VM is backed by a DLM lock, but by using dlmfs, this is simply a file in the dlmfs filesystem.

To show you how easy and powerful this is, I took some of the Oracle VM agent Python code, this is a very simple example of how to create a lock domain, a lock and when you know you get the lock or not. The focus here is just a master lock which y ou could use for an agent that is responsible for a virtual IP or some executable that you want to locate on a given server but the calls to create any kind of lock are in the code. Anyone that wants to experiment with this can add their own bits in a matter of minutes.

The prerequisite is simple : take a number of servers, configure an ocfs2 volume and an ocfs2 cluster (see previous blog entries) and run the script. You do not have to set up an ocfs2 volume if you do not want to, you could just set up the domain without actually mounting the filesystem. (See the global heartbeat blog). So practically this can be done with a very small simple setup.

My example has two nodes, wcoekaer-emgc1 and wcoekaer-emgc2 are the two Oracle Linux 6 nodes, configured with a shared disk and an ocfs2 filesystem mounted. This setup ensures that the dlmfs kernel module is loaded and the cluster is online. Take the python code listed here and just execute it on both nodes.

[root@wcoekaer-emgc2 ~]# lsmod |grep ocfs
ocfs2                1092529  1 
ocfs2_dlmfs            20160  1 
ocfs2_stack_o2cb        4103  1 
ocfs2_dlm             228380  1 ocfs2_stack_o2cb
ocfs2_nodemanager     219951  12 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue        11896  3 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb
configfs               29244  2 ocfs2_nodemanager
jbd2                   93114  2 ocfs2,ext4
You see that the ocfs2_dlmfs kernel module is loaded.

[root@wcoekaer-emgc2 ~]# mount |grep dlmfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
The dlmfs virtual filesystem is mounted on /dlm.

I now execute on both nodes and show some output, after a while I kill (control-c) the script on the master node and you see the other node take over the lock. I then restart the script and reboot the other node and you see the same.

[root@wcoekaer-emgc1 ~]# ./ 
Checking DLM
DLM Ready - joining domain : mycluster
Starting main loop...
i am master of the multiverse
i am master of the multiverse
i am master of the multiverse
i am master of the multiverse
^Ccleaned up master lock file
[root@wcoekaer-emgc1 ~]# ./ 
Checking DLM
DLM Ready - joining domain : mycluster
Starting main loop...
i am not the master
i am not the master
i am not the master
i am not the master
i am master of the multiverse
This shows that I started as master, then hit ctrl-c, I drop the lock, the other node takes the lock, then I reboot the other node and I take the lock again.

[root@wcoekaer-emgc2 ~]# ./
Checking DLM
DLM Ready - joining domain : mycluster
Starting main loop...
i am not the master
i am not the master
i am not the master
i am not the master
i am master of the multiverse
i am master of the multiverse
i am master of the multiverse
[1]+  Stopped                 ./
[root@wcoekaer-emgc2 ~]# bg
[1]+ ./ &
[root@wcoekaer-emgc2 ~]# reboot -f
Here you see that when this node started without being master, then at time of ctrl-c on the other node, became master, then after a forced reboot, the lock automatically gets released.

And here is the code, just copy it to your servers and execute it...

# Copyright (C) 2006-2012 Oracle. All rights reserved.
# This program is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation, version 2.  This program is distributed in the hope that it will
# be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
# Public License for more details.  You should have received a copy of the GNU
# General Public License along with this program; if not, write to the Free
# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
# 021110-1307, USA.

import sys
import subprocess
import stat
import time
import os
import re
import socket
from time import sleep

from os.path import join, isdir, exists

# defines
# dlmfs is where the dlmfs filesystem is mounted
# the default, normal place for ocfs2 setups is /dlm
# ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
DLMFS = "/dlm"

# we need a domain name which really is just a subdir in dlmfs
# default to "mycluster" so then it creates /dlm/mycluster
# locks are created inside this directory/domain
DLM_DOMAIN_NAME = "mycluster"

# the main lock to use for being the owner of a lock
# this can be any name, the filename is just the lockname

# just a timeout

def run_cmd(cmd, success_return_code=(0,)):
    if not isinstance(cmd, list):
        raise Exception("Only accepts list!")
    cmd = [str(x) for x in cmd]
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,
                            stderr=subprocess.PIPE, close_fds=True)
    (stdoutdata, stderrdata) = proc.communicate()
    if proc.returncode not in success_return_code:
        raise RuntimeError('Command: %s failed (%s): stderr: %s stdout: %s'
                           % (cmd, proc.returncode, stderrdata, stdoutdata))
    return str(stdoutdata)

def dlm_ready():
    Indicate if the DLM is ready of not.

    With dlmfs, the DLM is ready once the DLM filesystem is mounted
    under /dlm.

    @return: C{True} if the DLM is ready, C{False} otherwise.
    @rtype: C{bool}

    return os.path.ismount(DLMFS)

# just do a mkdir, if it already exists, we're good, if not just create it
def dlm_join_domain(domain=DLM_DOMAIN_NAME):
    _dir = join(DLMFS, domain)
    if not isdir(_dir):
    # else: already joined

# leaving a domain is basically removing the directory.
def dlm_leave_domain(domain=DLM_DOMAIN_NAME, force=True):
    _dir = join(DLMFS, domain)
    if force:
        cmd = ["rm", "-fr", _dir]
        cmd = ["rmdir", _dir]

# acquire a lock
def dlm_acquire_lock(lock):

    # a lock is a filename in the domain directory
    lock_path = join(DLM_DOMAIN_PATH, lock)

        if not exists(lock_path):
            fd =, os.O_CREAT | os.O_NONBLOCK)
        # create the EX lock
        # creating a file with O_RDWR causes an EX lock
        fd =, os.O_RDWR | os.O_NONBLOCK)
        # once the file is created in this mode, you can close it
        # and you still keep the lock
    except Exception, e:
        if exists(lock_path):
        raise e

def dlm_release_lock(lock):
    # releasing a lock is as easy as just removing the file
    lock_path = join(DLM_DOMAIN_PATH, lock)
    if exists(lock_path):

def acquire_master_dlm_lock():
    ETXTBUSY = 26


    # close() does not downconvert the lock level nor does it drop the lock. The
    # holder still owns the lock at that level after close.
    # close() allows any downconvert request to succeed.
    # However, a downconvert request is only generated for queued requests. And
    # O_NONBLOCK is specifically a noqueue dlm request.

    # 1) O_CREAT | O_NONBLOCK will create a lock file if it does not exist, whether
    #    we are the lock holder or not.
    # 2) if we hold O_RDWR lock, and we close but not delete it, we still hold it.
    #    afterward, O_RDWR will succeed, but O_RDWR | O_NONBLOCK will not.
    # 3) but if we donnot hold the lock, O_RDWR will hang there waiting,
    #    which is not desirable -- any uninterruptable hang is undesirable.
    # 4) if noboday else hold the lock either, but the lock file exists as side effect
    #    of 1), with O_NONBLOCK, it may result in ETXTBUSY

    # a) we need O_NONBLOCK to avoid scenario (3)
    # b) we need to delete it ourself to avoid (2)
    #   *) if we do not succeed with (1), remove the lock file to avoid (4)
    #   *) if everything is good, we drop it and we remove it
    #   *) if killed by a program, this program should remove the file
    #   *) if crashed, but not rebooted, something needs to remove the file
    #   *) on reboot/reset the lock is released to the other node(s)

        if not exists(DLM_LOCK_MASTER):
            fd =, os.O_CREAT | os.O_NONBLOCK)

        master_lock =, os.O_RDWR | os.O_NONBLOCK)

        print "i am master of the multiverse"
        # at this point, I know I am the master and I can add code to do
        # things that only a master can do, such as, consider setting
        # a virtual IP or, if I am master, I start a program
        # and if not, then I make sure I don't run that program (or VIP)
        # so the magic starts here...
        return True

    except OSError, e:
        if e.errno == ETXTBUSY:
            print "i am not the master"
            # if we are not master and the file exists, remove it or
            # we will never succeed
            if exists(DLM_LOCK_MASTER):
            raise e

def release_master_dlm_lock():
    if exists(DLM_LOCK_MASTER):

def run_forever():
    # set socket default timeout for all connections
    print "Checking DLM"
    if dlm_ready():
       print "DLM Ready - joining domain : " + DLM_DOMAIN_NAME
       print "DLM not ready - bailing, fix the cluster stack"


    print "Starting main loop..."

    while True:
        except Exception, e:
        except (KeyboardInterrupt, SystemExit):
            if exists(DLM_LOCK_MASTER):
            # if you control-c out of this, then you lose the lock!
            # delete it on exit for release
            print "cleaned up master lock file"

if __name__ == '__main__':

Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

You can follow him on Twitter at @wimcoekaerts


« January 2013 »