Wednesday Jul 11, 2012

Oracle VM VirtualBox virtual appliance images for Oracle VM 3.1.1 server and Manager

I updated the Oracle VM VirtualBox appliances for Oracle VM Manager. It now contains the latest release+patch Oracle VM Manager 3.1.1 build 365. Alongside the Manager VM I also created a preconfigured server setup. Oracle VM Server 3.1.1 build 365. The nice thing with this combination is that you can effectively run a smaller server pool on your desktop or laptop if you have a decent amount of RAM. I managed to create a 2 node server pool. Basically run the Manager VM + 2 server VMs on one 8gb macbook. Of course it wasn't terribly fast or useful to run anything serious but it was good enough to test HA and show off the functionality.

The VM's can be downloaded here. There are a few important things :

  • You can only run ParaVirtualized guests in the server VMs
  • I precreated nfs directories and started the nfs server bits in the Manager VM so you can use the Manager as an nfs repository for shared storage to the servers
  • I created a yum directory and updated httpd.conf on the Manager VM so that you can use and test the yum update facilities
  • You can add extra virtual disks to the server and they will show up as local storage for local repositories
  • iscsi target modules are installed on the Manager VM so you can also test out iscsi if you want to
  • It is highly recommended that you first start with the Manager VM to run in 4GB and then you can, if you want, drop it to about 3GB in size
  • The servers can run in as little as 700Mb-1GB
  • Use STATIC ip addresses for these VMs
  • Since this is build 365, the Oracle VM CLI (ssh) server is also installed in the Manager VM!
  • note : when creating the VM in VirtualBox, go into the netowrk settings of the VM and make sure that your virtual network is associated with the correct physical network. These VMs were exported on a Linux server with the virtual networks bound to eth0 as a device. On a Mac or a Windows PC this is likely a different name so to be safe just modify this before starting the VM

    The Manager VM is like the previous version 3.0. The VM starts and does an auto-login into Xwindows and there is a readme file that opens up in firefox and a login button that starts firefox to the local management port for Oracle VM.

    the server root password and ovs-agent password (to discover the server is ovsroot)

    Thursday Jul 05, 2012

    ASMLib

    (updated 3/18/13 to fix disknaming error) Oracle ASMlib on Linux has been a topic of discussion a number of times since it was released way back when in 2004. There is a lot of confusion around it and certainly a lot of misinformation out there for no good reason. Let me try to give a bit of history around Oracle ASMLib.

    Oracle ASMLib was introduced at the time Oracle released Oracle Database 10g R1. 10gR1 introduced a very cool important new features called Oracle ASM (Automatic Storage Management). A very simplistic description would be that this is a very sophisticated volume manager for Oracle data. Give your devices directly to the ASM instance and we manage the storage for you, clustered, highly available, redundant, performance, etc, etc... We recommend using Oracle ASM for all database deployments, single instance or clustered (RAC).

    The ASM instance manages the storage and every Oracle server process opens and operates on the storage devices like it would open and operate on regular datafiles or raw devices. So by default since 10gR1 up to today, we do not interact differently with ASM managed block devices than we did before with a datafile being mapped to a raw device. All of this is without ASMLib, so ignore that one for now. Standard Oracle on any platform that we support (Linux, Windows, Solaris, AIX, ...) does it the exact same way. You start an ASM instance, it handles storage management, all the database instances use and open that storage and read/write from/to it. There are no extra pieces of software needed, including on Linux. ASM is fully functional and selfcontained without any other components.

    In order for the admin to provide a raw device to ASM or to the database, it has to have persistent device naming. If you booted up a server where a raw disk was named /dev/sdf and you give it to ASM (or even just creating a tablespace without asm on that device with datafile '/dev/sdf') and next time you boot up and that device is now /dev/sdg, you end up with an error. Just like you can't just change datafile names, you can't change device filenames without telling the database, or ASM. persistent device naming on Linux, especially back in those days ways to say it bluntly, a nightmare. In fact there were a number of issues (dating back to 2004) :

    Correction to the above: ASM can handle device name changes across reboots with the correct ASM_DISKSTRING in the init.ora, it will be able to find the disks even if they changed, however part of device naming and device metadata on reboot is the correct ownership (oracle:dba ....). With ASMLib in place, this is not an issue and it will take care of ownership and permissions of the ASM disk devices.

  • Linux async IO wasn't pretty
  • persistent device naming including permissions (had to be owned by oracle and the dba group) was very, very difficult to manage
  • system resource usage in terms of open file descriptors
  • So given the above, we tried to find a way to make this easier on the admins, in many ways, similar to why we started working on OCFS a few years earlier -> how can we make life easier for the admins on Linux.

    A feature of Oracle ASM is the ability for third parties to write an extension using what's called ASMLib. It is possible for any third party OS or storage vendor to write a library using a specific Oracle defined interface that gets used by the ASM instance and by the database instance when available. This interface offered 2 components :

  • Define an IO interface - allow any IO to the devices to go through ASMLib
  • Define device discovery - implement an external way of discovering, labeling devices to provide to ASM and the Oracle database instance
  • This is similar to a library that a number of companies have implemented over many years called libODM (Oracle Disk Manager). ODM was specified many years before we introduced ASM and allowed third party vendors to implement their own IO routines so that the database would use this library if installed and make use of the library open/read/write/close,.. routines instead of the standard OS interfaces. PolyServe back in the day used this to optimize their storage solution, Veritas used (and I believe still uses) this for their filesystem. It basically allowed, in particular, filesystem vendors to write libraries that could optimize access to their storage or filesystem.. so ASMLib was not something new, it was basically based on the same model. You have libodm for just database access, you have libasm for asm/database access.

    Since this library interface existed, we decided to do a reference implementation on Linux. We wrote an ASMLib for Linux that could be used on any Linux platform and other vendors could see how this worked and potentially implement their own solution. As I mentioned earlier, ASMLib and ODMLib are libraries for third party extensions. ASMLib for Linux, since it was a reference implementation implemented both interfaces, the storage discovery part and the IO part. There are 2 components :

  • Oracle ASMLib - the userspace library with config tools (a shared object and some scripts)
  • oracleasm.ko - a kernel module that implements the asm device for /dev/oracleasm/*
  • The userspace library is a binary-only module since it links with and contains Oracle header files but is generic, we only have one asm library for the various Linux platforms. This library is opened by Oracle ASM and by Oracle database processes and this library interacts with the OS through the asm device (/dev/asm). It can install on Oracle Linux, on SuSE SLES, on Red Hat RHEL,.. The library itself doesn't actually care much about the OS version, the kernel module and device cares. The support tools are simple scripts that allow the admin to label devices and scan for disks and devices. This way you can say create an ASM disk label foo on, currently /dev/sdf... So if /dev/sdf disappears and next time is /dev/sdg, we just scan for the label foo and we discover it as /dev/sdg and life goes on without any worry. Also, when the database needs access to the device, we don't have to worry about file permissions or anything it will be taken care of. So it's a convenience thing.

    Correction: the extra advantage with ASMLib here being the fact that it will take care of the file permissions and ownership of the device.

    The kernel module oracleasm.ko is a Linux kernel module/device driver. It implements a device /dev/oracleasm/* and any and all IO goes through ASMLib -> /dev/oracleasm. This kernel module is obviously a very specific Oracle related device driver but it was released under the GPL v2 so anyone could easily build it for their Linux distribution kernels.

    Advantages for using ASMLib :

  • A good async IO interface for the database, the entire IO interface is based on an optimal ASYNC model for performance
  • A single file descriptor per Oracle process, not one per device or datafile per process reducing # of open filehandles overhead
  • Device scanning and labeling built-in so you do not have to worry about messing with udev or devlabel, permissions or the likes which can be very complex and error prone.
  • Just like with OCFS and OCFS2, each kernel version (major or minor) has to get a new version of the device drivers. We started out building the oracleasm kernel module rpms for many distributions, SLES (in fact in the early days still even for this thing called United Linux) and RHEL. The driver didn't make sense to get pushed into upstream Linux because it's unique and specific to the Oracle database.

    As it takes a huge effort in terms of build infrastructure and QA and release management to build kernel modules for every architecture, every linux distribution and every major and minor version we worked with the vendors to get them to add this tiny kernel module to their infrastructure. (60k source code file). The folks at SuSE understood this was good for them and their customers and us and added it to SLES. So every build coming from SuSE for SLES contains the oracleasm.ko module. We weren't as successful with other vendors so for quite some time we continued to build it for RHEL and of course as we introduced Oracle Linux end of 2006 also for Oracle Linux. With Oracle Linux it became easy for us because we just added the code to our build system and as we churned out Oracle Linux kernels whether it was for a public release or for customers that needed a one off fix where they also used asmlib, we didn't have to do any extra work it was just all nicely integrated.

    With the introduction of Oracle Linux's Unbreakable Enterprise Kernel and our interest in being able to exploit ASMLib more, we started working on a very exciting project called Data Integrity. Oracle (Martin Petersen in particular) worked for many years with the T10 standards committee and storage vendors and implemented Linux kernel support for DIF/DIX, data protection in the Linux kernel, note to those that wonder, yes it's all in mainline Linux and under the GPL. This basically gave us all the features in the Linux kernel to checksum a data block, send it to the storage adapter, which can then validate that block and checksum in firmware before it sends it over the wire to the storage array, which can then do another checksum and to the actual DISK which does a final validation before writing the block to the physical media. So what was missing was the ability for a userspace application (read: Oracle RDBMS) to write a block which then has a checksum and validation all the way down to the disk. application to disk.

    Because we have ASMLib we had an entry into the Linux kernel and Martin added support in ASMLib (kernel driver + userspace) for this functionality. Now, this is all based on relatively current Linux kernels, the oracleasm kernel module depends on the main kernel to have support for it so we can make use of it. Thanks to UEK and us having the ability to ship a more modern, current version of the Linux kernel we were able to introduce this feature into ASMLib for Linux from Oracle. This combined with the fact that we build the asm kernel module when we build every single UEK kernel allowed us to continue improving ASMLib and provide it to our customers.

    So today, we (Oracle) provide Oracle ASMLib for Oracle Linux and in particular on the Unbreakable Enterprise Kernel. We did the build/testing/delivery of ASMLib for RHEL until RHEL5 but since RHEL6 decided that it was too much effort for us to also maintain all the build and test environments for RHEL and we did not have the ability to use the latest kernel features to introduce the Data Integrity features and we didn't want to end up with multiple versions of asmlib as maintained by us. SuSE SLES still builds and comes with the oracleasm module and they do all the work and RHAT it certainly welcome to do the same. They don't have to rebuild the userspace library, it's really about the kernel module.

    And finally to re-iterate a few important things :

  • Oracle ASM does not in any way require ASMLib to function completely. ASMlib is a small set of extensions, in particular to make device management easier but there are no extra features exposed through Oracle ASM with ASMLib enabled or disabled. Often customers confuse ASMLib with ASM. again, ASM exists on every Oracle supported OS and on every supported Linux OS, SLES, RHEL, OL withoutASMLib
  • Oracle ASMLib userspace is available for OTN and the kernel module is shipped along with OL/UEK for every build and by SuSE for SLES for every of their builds
  • ASMLib kernel module was built by us for RHEL4 and RHEL5 but we do not build it for RHEL6, nor for the OL6 RHCK kernel. Only for UEK
  • ASMLib for Linux is/was a reference implementation for any third party vendor to be able to offer, if they want to, their own version for their own OS or storage
  • ASMLib as provided by Oracle for Linux continues to be enhanced and evolve and for the kernel module we use UEK as the base OS kernel
  • hope this helps.

    Wednesday Jul 04, 2012

    What's up with OCFS2?

    On Linux there are many filesystem choices and even from Oracle we provide a number of filesystems, all with their own advantages and use cases. Customers often confuse ACFS with OCFS or OCFS2 which then causes assumptions to be made such as one replacing the other etc... I thought it would be good to write up a summary of how OCFS2 got to where it is, what we're up to still, how it is different from other options and how this really is a cool native Linux cluster filesystem that we worked on for many years and is still widely used.

    Work on a cluster filesystem at Oracle started many years ago, in the early 2000's when the Oracle Database Cluster development team wrote a cluster filesystem for Windows that was primarily focused on providing an alternative to raw disk devices and help customers with the deployment of Oracle Real Application Cluster (RAC). Oracle RAC is a cluster technology that lets us make a cluster of Oracle Database servers look like one big database. The RDBMS runs on many nodes and they all work on the same data. It's a Shared Disk database design. There are many advantages doing this but I will not go into detail as that is not the purpose of my write up. Suffice it to say that Oracle RAC expects all the database data to be visible in a consistent, coherent way, across all the nodes in the cluster. To do that, there were/are a few options : 1) use raw disk devices that are shared, through SCSI, FC, or iSCSI 2) use a network filesystem (NFS) 3) use a cluster filesystem(CFS) which basically gives you a filesystem that's coherent across all nodes using shared disks. It is sort of (but not quite) combining option 1 and 2 except that you don't do network access to the files, the files are effectively locally visible as if it was a local filesystem.

    So OCFS (Oracle Cluster FileSystem) on Windows was born. Since Linux was becoming a very important and popular platform, we decided that we would also make this available on Linux and thus the porting of OCFS/Windows started. The first version of OCFS was really primarily focused on replacing the use of Raw devices with a simple filesystem that lets you create files and provide direct IO to these files to get basically native raw disk performance. The filesystem was not designed to be fully POSIX compliant and it did not have any where near good/decent performance for regular file create/delete/access operations. Cache coherency was easy since it was basically always direct IO down to the disk device and this ensured that any time one issues a write() command it would go directly down to the disk, and not return until the write() was completed. Same for read() any sort of read from a datafile would be a read() operation that went all the way to disk and return. We did not cache any data when it came down to Oracle data files.

    So while OCFS worked well for that, since it did not have much of a normal filesystem feel, it was not something that could be submitted to the kernel mail list for inclusion into Linux as another native linux filesystem (setting aside the Windows porting code ...) it did its job well, it was very easy to configure, node membership was simple, locking was disk based (so very slow but it existed), you could create regular files and do regular filesystem operations to a certain extent but anything that was not database data file related was just not very useful in general. Logfiles ok, standard filesystem use, not so much. Up to this point, all the work was done, at Oracle, by Oracle developers.

    Once OCFS (1) was out for a while and there was a lot of use in the database RAC world, many customers wanted to do more and were asking for features that you'd expect in a normal native filesystem, a real "general purposes cluster filesystem". So the team sat down and basically started from scratch to implement what's now known as OCFS2 (Oracle Cluster FileSystem release 2). Some basic criteria were :

  • Design it with a real Distributed Lock Manager and use the network for lock negotiation instead of the disk
  • Make it a Linux native filesystem instead of a native shim layer and a portable core
  • Support standard Posix compliancy and be fully cache coherent with all operations
  • Support all the filesystem features Linux offers (ACL, extended Attributes, quotas, sparse files,...)
  • Be modern, support large files, 32/64bit, journaling, data ordered journaling, endian neutral, we can mount on both endian /cross architecture,..
  • Needless to say, this was a huge development effort that took many years to complete. A few big milestones happened along the way...

  • OCFS2 was development in the open, we did not have a private tree that we worked on without external code review from the Linux Filesystem maintainers, great folks like Christopher Hellwig reviewed the code regularly to make sure we were not doing anything out of line, we submitted the code for review on lkml a number of times to see if we were getting close for it to be included into the mainline kernel. Using this development model is standard practice for anyone that wants to write code that goes into the kernel and having any chance of doing so without a complete rewrite or.. shall I say flamefest when submitted. It saved us a tremendous amount of time by not having to re-fit code for it to be in a Linus acceptable state. Some other filesystems that were trying to get into the kernel that didn't follow an open development model had a lot harder time and a lot harsher criticism.
  • March 2006, when Linus released 2.6.16, OCFS2 officially became part of the mainline kernel, it was accepted a little earlier in the release candidates but in 2.6.16. OCFS2 became officially part of the mainline Linux kernel tree as one of the many filesystems. It was the first cluster filesystem to make it into the kernel tree. Our hope was that it would then end up getting picked up by the distribution vendors to make it easy for everyone to have access to a CFS. Today the source code for OCFS2 is approximately 85000 lines of code.
  • We made OCFS2 production with full support for customers that ran Oracle database on Linux, no extra or separate support contract needed. OCFS2 1.0.0 started being built for RHEL4 for x86, x86-64, ppc, s390x and ia64. For RHEL5 starting with OCFS2 1.2.
  • SuSE was very interested in high availability and clustering and decided to build and include OCFS2 with SLES9 for their customers and was, next to Oracle, the main contributor to the filesystem for both new features and bug fixes.
  • Source code was always available even prior to inclusion into mainline and as of 2.6.16, source code was just part of a Linux kernel download from kernel.org, which it still is, today. So the latest OCFS2 code is always the upstream mainline Linux kernel.
  • OCFS2 is the cluster filesystem used in Oracle VM 2 and Oracle VM 3 as the virtual disk repository filesystem.
  • Since the filesystem is in the Linux kernel it's released under the GPL v2
  • The release model has always been that new feature development happened in the mainline kernel and we then built consistent, well tested, snapshots that had versions, 1.2, 1.4, 1.6, 1.8. But these releases were effectively just snapshots in time that were tested for stability and release quality.

    OCFS2 is very easy to use, there's a simple text file that contains the node information (hostname, node number, cluster name) and a file that contains the cluster heartbeat timeouts. It is very small, and very efficient. As Sunil Mushran wrote in the manual :

  • OCFS2 is an efficient, easily configured, quickly installed, fully integrated and compatible, feature-rich, architecture and endian neutral, cache coherent, ordered data journaling, POSIX-compliant, shared disk cluster file system.
  • Here is a list of some of the important features that are included :

  • Variable Block and Cluster sizes Supports block sizes ranging from 512 bytes to 4 KB and cluster sizes ranging from 4 KB to 1 MB (increments in power of 2).
  • Extent-based Allocations Tracks the allocated space in ranges of clusters making it especially efficient for storing very large files.
  • Optimized Allocations Supports sparse files, inline-data, unwritten extents, hole punching and allocation reservation for higher performance and efficient storage.
  • File Cloning/snapshots REFLINK is a feature which introduces copy-on-write clones of files in a cluster coherent way.
  • Indexed Directories Allows efficient access to millions of objects in a directory.
  • Metadata Checksums Detects silent corruption in inodes and directories.
  • Extended Attributes Supports attaching an unlimited number of name:value pairs to the file system objects like regular files, directories, symbolic links, etc.
  • Advanced Security Supports POSIX ACLs and SELinux in addition to the traditional file access permission model.
  • Quotas Supports user and group quotas.
  • Journaling Supports both ordered and writeback data journaling modes to provide file system consistency in the event of power failure or system crash.
  • Endian and Architecture neutral Supports a cluster of nodes with mixed architectures. Allows concurrent mounts on nodes running 32-bit and 64-bit, little-endian (x86, x86_64, ia64) and big-endian (ppc64) architectures.
  • In-built Cluster-stack with DLM Includes an easy to configure, in-kernel cluster-stack with a distributed lock manager.
  • Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os Supports all modes of I/Os for maximum flexibility and performance.
  • Comprehensive Tools Support Provides a familiar EXT3-style tool-set that uses similar parameters for ease-of-use.
  • The filesystem was distributed for Linux distributions in separate RPM form and this had to be built for every single kernel errata release or every updated kernel provided by the vendor. We provided builds from Oracle for Oracle Linux and all kernels released by Oracle and for Red Hat Enterprise Linux. SuSE provided the modules directly for every kernel they shipped. With the introduction of the Unbreakable Enterprise Kernel for Oracle Linux and our interest in reducing the overhead of building filesystem modules for every minor release, we decide to make OCFS2 available as part of UEK. There was no more need for separate kernel modules, everything was built-in and a kernel upgrade automatically updated the filesystem, as it should. UEK allowed us to not having to backport new upstream filesystem code into an older kernel version, backporting features into older versions introduces risk and requires extra testing because the code is basically partially rewritten. The UEK model works really well for continuing to provide OCFS2 without that extra overhead.

    Because the RHEL kernel did not contain OCFS2 as a kernel module (it is in the source tree but it is not built by the vendor in kernel module form) we stopped adding the extra packages to Oracle Linux and its RHEL compatible kernel and for RHEL. Oracle Linux customers/users obviously get OCFS2 included as part of the Unbreakable Enterprise Kernel, SuSE customers get it by SuSE distributed with SLES and Red Hat can decide to distribute OCFS2 to their customers if they chose to as it's just a matter of compiling the module and making it available.

    OCFS2 today, in the mainline kernel is pretty much feature complete in terms of integration with every filesystem feature Linux offers and it is still actively maintained with Joel Becker being the primary maintainer. Since we use OCFS2 as part of Oracle VM, we continue to look at interesting new functionality to add, REFLINK was a good example, and as such we continue to enhance the filesystem where it makes sense. Bugfixes and any sort of code that goes into the mainline Linux kernel that affects filesystems, automatically also modifies OCFS2 so it's in kernel, actively maintained but not a lot of new development happening at this time. We continue to fully support OCFS2 as part of Oracle Linux and the Unbreakable Enterprise Kernel and other vendors make their own decisions on support as it's really a Linux cluster filesystem now more than something that we provide to customers. It really just is part of Linux like EXT3 or BTRFS etc, the OS distribution vendors decide.

    Do not confuse OCFS2 with ACFS (ASM cluster Filesystem) also known as Oracle Cloud Filesystem. ACFS is a filesystem that's provided by Oracle on various OS platforms and really integrates into Oracle ASM (Automatic Storage Management). It's a very powerful Cluster Filesystem but it's not distributed as part of the Operating System, it's distributed with the Oracle Database product and installs with and lives inside Oracle ASM. ACFS obviously is fully supported on Linux (Oracle Linux, Red Hat Enterprise Linux) but OCFS2 independently as a native Linux filesystem is also, and continues to also be supported. ACFS is very much tied into the Oracle RDBMS, OCFS2 is just a standard native Linux filesystem with no ties into Oracle products. Customers running the Oracle database and ASM really should consider using ACFS as it also provides storage/clustered volume management. Customers wanting to use a simple, easy to use generic Linux cluster filesystem should consider using OCFS2.

    To learn more about OCFS2 in detail, you can find good documentation on http://oss.oracle.com/projects/ocfs2 in the Documentation area, or get the latest mainline kernel from http://kernel.org and read the source.

    One final, unrelated note - since I am not always able to publicly answer or respond to comments, I do not want to selectively publish comments from readers. Sometimes I forget to publish comments, sometime I publish them and sometimes I would publish them but if for some reason I cannot publicly comment on them, it becomes a very one-sided stream. So for now I am going to not publish comments from anyone, to be fair to all sides. You are always welcome to email me and I will do my best to respond to technical questions, questions about strategy or direction are sometimes not possible to answer for obvious reasons.

    About

    Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

    You can follow him on Twitter at @wimcoekaerts

    Search

    Categories
    • Oracle
    Archives
    « July 2012 »
    SunMonTueWedThuFriSat
    1
    2
    3
    6
    7
    8
    9
    10
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
        
           
    Today