Having a shared file system is a very common request to allow applications to be able to access the same data or to allow multiple users to get access to same information at the same time for example. On-premises this is a very easy task to achieve using NAS or SAN devices but how can it be done in the cloud?
There are different technologies like iSCSI, NFS, SMB, DRBD and other services that allow you to share a block device with two or more Cloud instances but you still need to configure those services and on top of that you also need a cluster file system like OCFS2 or GlusterFS that will allow your users to read/write simultaneously.
With Oracle Cloud Infrastructure you have the multi-attach block device option that allows you to attach the same block device with two or more Cloud instances. This feature is under Limited Availability which means you need to have your tenancy enabled to be able to use it. This allows customers to easily connect the same block storage volume(s) to all the instances that need to get access to the same data. It basically acts as a NAS Cloud device.
As of today, the process is done through a preview version of OCI CLI which needs to be requested from Oracle. Once you get access to that new OCI CLI version and your tenancy has been enabled to use such feature you will be able to run the OCI command line to attach a block device to multiple cloud instances where you plan to use to hold your cluster file system. Here is an example:
"oci compute volume-attachment attach --instance-id ocid1.instance.oc1.OCID --type iscsi --volume-id ocid1.volume.oc1.REGION.OCID --is-shareable true"
Now that you have your block volume attached to all instances you need, next step is creating a file system that is cluster aware. For this blog we will use OCFS2 (Oracle Cluster File System) as the following diagram illustrates.
Oracle Cluster File System version 2 (OCFS2) is a general-purpose shared-disk file system intended for use in clusters to increase storage performance and availability. Almost any application can use OCFS2 because it provides local file-system semantics. Applications that are cluster-aware can use cache-coherent parallel I/O from multiple cluster nodes to balance activity across the cluster, or they can use the available file-system functionality to fail over and run on another node in the event that a node fails.
OCFS2 has a large number of features that make it suitable for deployment in an enterprise-level computing environment:
- - Support for ordered and write-back data journaling that provides file system consistency in the event of power failure or system crash.
- - Block sizes ranging from 512 bytes to 4 KB, and file-system cluster sizes ranging from 4 KB to 1 MB (both in increments of powers of 2). The maximum supported volume size is 16 TB, which corresponds to a cluster size of 4 KB. A volume size as large as 4 PB is theoretically possible for a cluster size of 1 MB, although this limit has not been tested.
- - Extent-based allocations for efficient storage of very large files.
- - Optimized allocation support for sparse files, inline-data, unwritten extents, hole punching, reflinks, and allocation reservation for high performance and efficient storage.
- - Indexing of directories to allow efficient access to a directory even if it contains millions of objects.
- - Metadata checksums for the detection of corrupted inodes and directories.
- - Extended attributes to allow an unlimited number of name:value pairs to be attached to file system objects such as regular files, directories, and symbolic links.
- - Advanced security support for POSIX ACLs and SELinux in addition to the traditional file-access permission model.
- - Support for user and group quotas.
- - Support for heterogeneous clusters of nodes with a mixture of 32-bit and 64-bit, little-endian (x86, x86_64, ia64) and big-endian (ppc64) architectures.
- - An easy-to-configure, in-kernel cluster-stack (O2CB) with a distributed lock manager (DLM), which manages concurrent access from the cluster nodes.
- - Support for buffered, direct, asynchronous, splice and memory-mapped I/O.
- - A tool set that uses similar parameters to the ext3 file system.
Below is a summary of the configuration steps required for this architecture:
1. Attach your Oracle Cloud Infrastructure Block Device(s) using oci CLI as explained above
2. Set up your OCFS2/O2CB cluster Nodes
3. Create your OCFS2 file system and mount point
You also need to open ports 7777 and 3260 on the Oracle Cloud Infrastructure Dashboard. Edit the VCN Security List and either open all ports for your tenancy Internal Network (NOT PUBLIC NETWORK) as shown below for network 126.96.36.199/16
IP Protocol: All Protocols
Allows: all traffic for all ports
or open only the required 7777 and 3260 ports for the internal network and here is an example for port 7777:
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 7777
Allows: TCP traffic for ports: 7777
NOTE: Ports 7777 and 3260 need to opened in the local OS firewall as well as shown below
- sudo firewall-cmd --zone=public --permanent --add-port=7777/tcp
- sudo firewall-cmd --zone=public --permanent --add-port=3260/tcp
- sudo firewall-cmd --complete-reload
Make sure DNS is working properly and your bare metal instances can communicate properly across your tenancy availability domains (ADs). Here is a quick example of /etc/resolv.conf based on this setup
$ cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search baremetal.oraclevcn.com publicsubnetad3.baremetal.oraclevcn.com publicsubnetad1.baremetal.oraclevcn.com publicsubnetad1.baremetal.oraclevcn.com
As you can see above, all ADs DNS entries are available in that resolv.conf file.
||Oracle Linux 7.4 x86_64
||Oracle Linux 7.4 x86_64
Creating the Configuration File for the Cluster Stack
Install the required OCFS2 packages
|$ sudo yum install ocfs2-tools-devel ocfs2-tools -y
Now, create the configuration file by using the o2cb command or a text editor. Lets use the following command to create a cluster definition.
|$ sudo o2cb add-cluster ociocfs2
The above command creates the configuration file /etc/ocfs2/cluster.conf if it does not already exist.
For each node, use the following command to define the node.
$ sudo o2cb add-node ociocfs2 node1 --ip 188.8.131.52
$ sudo o2cb add-node ociocfs2 node2 --ip 184.108.40.206
NOTE: The name of the node must be same as the value of the system's HOSTNAME that is configured in /etc/sysconfig/network and the IP address is the one that the node will use for private communication in the cluster. You need to copy the cluster configuration file /etc/ocfs2/cluster.conf to each node in the cluster. Any changes that you make to the cluster configuration file do not take effect until you restart the cluster stack.
The following /etc/ocfs2/cluster.conf configuration file defines a 2-node cluster named ociocfs2 with a local heartbeat which is the configuration used for this tutorial.
$ sudo cat /etc/ocfs2/cluster.conf
heartbeat_mode = local
node_count = 2
name = ociocfs2
number = 0
cluster = ociocfs2
ip_port = 7777
ip_address = 220.127.116.11
name = node1
number = 1
cluster = ociocfs2
ip_port = 7777
ip_address = 18.104.22.168
name = node2
Configuring the Cluster Stack
Run the following command on each node of the cluster:
$ sudo /sbin/o2cb.init configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets (''). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
Load O2CB driver on boot (y/n) [y]:
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ociocfs2
Specify heartbeat dead threshold (>=7) :
Specify network idle timeout in ms (>=5000) :
Specify network keepalive delay in ms (>=1000) :
Specify network reconnect delay in ms (>=2000) :
Writing O2CB configuration: OK
Setting cluster stack "o2cb": OK
Registering O2CB cluster "ociocfs2": OK
Setting O2CB cluster timeouts : OK
Starting global heartbeat for cluster "ociocfs2": OK
Explanation of the above options can be found in OCFS2 public documentation
To verify the settings for the cluster stack, enter the /sbin/o2cb.init status command:
$ sudo /sbin/o2cb.init status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "ociocfs2": Online
Heartbeat dead threshold: 31
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Heartbeat mode: Local
Checking O2CB heartbeat: Active
Debug file system at /sys/kernel/debug: mounted
In this example, the cluster is online and is using local heartbeat mode. If no volumes have been configured, the O2CB heartbeat is shown as Not Active rather than Active.
Configure the o2cb and ocfs2 services so that they start at boot time after networking is enabled.
|$ sudo systemctl enable o2cb
$ sudo systemctl enable ocfs2
These settings allow the node to mount OCFS2 volumes automatically when the system starts.
Configuring the Kernel for Cluster Operation
For the correct operation of the cluster, you must configure the kernel settings shown in the following table:
Specifies the number of seconds after a panic before a system will automatically reset itself.
If the value is 0, the system hangs, which allows you to collect detailed information about the panic for troubleshooting. This is the default value.
To enable automatic reset, set a non-zero value. If you require a memory image (vmcore), allow enough time for Kdump to create this image. The suggested value is 30 seconds, although large systems will require a longer time.
||Specifies that a system must panic if a kernel oops occurs. If a kernel thread required for cluster operation crashes, the system must reset itself. Otherwise, another node might not be able to tell whether a node is slow to respond or unable to respond, causing cluster operations to hang.
On each node, enter the following commands to set the recommended values for panic and panic_on_oops:
$ sudo sysctl kernel.panic=30
$ sudo sysctl kernel.panic_on_oops=1
To make the change persist across reboots, add the following entries to the /etc/sysctl.conf file:
# Define panic and panic_on_oops for cluster operation
Starting and Stopping the Cluster Stack
The following table shows the commands that you can use to perform various operations on the cluster stack.
||Check the status of the cluster stack.
||Start the cluster stack.
||Stop the cluster stack.
||Unload the cluster stack.
Creating OCFS2 volumes
Use mkfs.ocfs2 command to create an OCFS2 volume on a device. If you want to label the volume and mount it by specifying the label, the device must correspond to a partition. You cannot mount an unpartitioned disk device by specifying a label.
$ sudo mkfs.ocfs2 -L "ocfs2" /dev/sdb
Cluster stack: classic o2cb
Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg
Block size: 4096 (12 bits)
Cluster size: 4096 (12 bits)
Volume size: 12455405158400 (3040870400 clusters) (3040870400 blocks)
Cluster groups: 94274 (tail covers 512 clusters, rest cover 32256 clusters)
Extent allocator size: 780140544 (186 groups)
Journal size: 268435456
Node slots: 16
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 6 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
Mounting OCFS2 Volumes
As shown in the following example, specify "_netdev" and "nofail" options in /etc/fstab if you want the system to mount an OCFS2 volume at boot time after networking is started, and to unmount the file system before networking is stopped.
$ sudo mkdir /ocfs2
$ sudo vi /etc/fstab
#include the below line to mount your ocfs2 after a restart
/dev/sdb /ocfs2 ocfs2 _netdev,defaults 0 0
Run "mount -a" to mount the OCFS2 partition based on the fstab entry you created above and the setup is concluded. You should have a cluster file system mounted on /ocfs2 on both Oracle Linux 7.4 node1 and node2 servers.
Finally, you're finished! Your applications can now use this storage as they would with any local file storage. Planning your environment thoughtfully and making use of Availability Domains and capabilities such as Oracle Cluster File System can help you increase the performance and availability of the solutions you build on Oracle Cloud Infrastructure.