Thursday Mar 29, 2012

4.8M wasn't enough so we went for 5.055M tpmc with Unbreakable Enterprise Kernel r2 :-)

We released a new set of benchmarks today. One is an updated tpc-c from a few months ago where we had just over 4.8M tpmc at $0.98 and we just updated it to go to 5.05M and $0.89. The other one is related to Java Middleware performance. You can find the press release here.

Now, I don't want to talk about the actual relevance of the benchmark numbers, as I am not in the benchmark team. I want to talk about why these numbers and these efforts, unrelated to what they mean to your workload, matter to customers. The actual benchmark effort is a very big, long, expensive undertaking where many groups work together as a big virtual team. Having the virtual team be within a single company of course helps tremendously... We already start with a very big server setup with tons of storage, many disks, lots of ram, lots of cpu's, cores, threads, large database setups. Getting the whole setup going to start tuning, by itself, is no easy task, but then the real fun starts with tuning the system for optimal performance -and- stability. A benchmark is not just revving an engine at high rpm, it's actually hitting the circuit. The tests require long runs, require surviving availability tests, such as surviving crashes -and- recovery under load.

In the TPC-C example, the x4800 system had 4TB ram, 160 threads (8 sockets, hyperthreaded, 10 cores/socket), tons of storage attached, tons of luns visible to the OS. flash storage, non flash storage... many things at high scale that all have to be perfectly synchronized.

During this process, we find bugs, we fix bugs, we find performance issues, we fix performance issues, we find interesting potential features to investigate for the future, we start new development projects for future releases and all this goes back into the products. As more and more customers, for Oracle Linux, are running larger and larger, faster and faster, more mission critical, higher available databases..., these things are just absolutely critical. Unrelated to what anyone's specific opinion is about tpc-c or tpc-h or specjenterprise etc, there is a ton of effort that the customer benefits from. All this work makes Oracle Linux and/or Oracle Solaris better platforms. Whether it's faster, more stable, more scalable, more resilient. It helps.

Another point that I always like to re-iterate around UEK and UEK2 : we have our kernel source git repository online. Complete changelog of the mainline kernel, and our changes, easy to pull, easy to dissect, easy to know what went in when, why and where. No need to go log into a website and manually click through pages to hopefully discover changes or patches. No need to untar 2 tar balls and run a diff.

Monday Mar 26, 2012

Using EPEL repos with Oracle Linux

There's a Fedora project called EPEL which hosts a set of additional packages that can be installed on top of various distributions such as Red Hat Enterprise Linux, CentOS, Scientific Linux and of course also Oracle Linux. These packages are not distributed by the distribution vendor and as such also not supported by the vendors (including Oracle) however for users that want to pick up some extras that are useful, it's very easy to do this.

All you need to do is download the EPEL RPM from the website, install it on Oracle Linux 5 or Oracle Linux 6 and run yum install or yum search to get the packages.

example : 
# wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm

# rpm -ivh epel-release-6-5.noarch.rpm

# yum repolist
Loaded plugins: refresh-packagekit, rhnplugin
repo id                repo name                                          status
epel                   Extra Packages for Enterprise Linux 6 - x86_64      7,124

The folks that build these repositories are doing a great job at adding very useful packages. They are free, but also unsupported of course.

Thursday Mar 22, 2012

Setting up Oracle Linux 6 with public-yum for all updates

I just wanted to give you a quick example on how to get started with Oracle Linux 6 and start using the updates we published on http://public-yum.oracle.com.

  • Download Oracle Linux (without the requirement of a support subscription) from http://edelivery.oracle.com/linux.

  • Install Oracle Linux from the ISO or DVD image

  • Log in as user root
  • Download the yum repo file from http://public-yum.oracle.com

    # cd /etc/yum.repos.d
    # wget http://public-yum.oracle.com/public-yum-ol6.repo
    

  • If you want, you can edit the repo file and enable other repositories, I enabled [ol6_UEK_latest] by just setting enabled=1 in the file with a text editor.

  • Run yum repolist to show the registered channels and you see we are including everything including the latest published RPMs.

    Now you can just run yum update and any time we release new security errata or bugfix errata for OL6, they will be posted and you will automatically get them. It's very easy, very convenient and actually very cool. We do a lot more than just build OL RPMs and distribute them, we have a very comprehensive test farm where we test the packages extensively.

  • Lots of goodies

    We just issued a press release with a number of very good updates for everyone

    There are a few things of importance :

    1) As of right now, Oracle Linux 6 with the Unbreakable Kernel is certified with a number of Oracle products such as Oracle Database 11gR2 and Oracle Fusion Middleware. The certification pages in the Oracle Support portal will be updated with the latest certification status for the various products.

    As always we have gone through a long period of very comprehensive testing and validation to ensure that the whole stack works really well together, with very large database workloads, middleware application workloads etc.

    2) Standard certification efforts for Oracle Linux 6 with the Red Hat Compatible Kernel are in progress and we expect that to be completed in the next few months. Because of the compatibility between OL6 and RHEL6 we can then also state certification for RHEL6.

    3) Oracle Linux binaries (and of course source code) have been free for download -and- use (including production, not just trial periods) since day one. You can freely redistribute the binaries, unlike many other Linux vendors where you need to pay a support subscription to even get access to the binaries. We offered both the base distribution release DVDs (OL4, OL5, OL6) and the update releases, such as 5.1, 5.2 etc. this way. Today, in this announcement, we also started to make available the bugfix and security updates released in between these update releases. So the errata streams (both binary and source code) for OL4, 5 and 6 are now free for download and use from http://public-yum.oracle.com. This includes uek and uek2.

    The nice thing is, if you want a complete up to date system without support, use this, if you then need support, get a support subscription. Simple, convenient, effective. We have great SLA's in producing our update streams, consistency in release timing and testing of all the components.

    Have at it!

    Wednesday Feb 22, 2012

    DTrace update to 0.2

    We just put an updated version of DTrace on ULN.
    This is version 0.2, another preview (hence the 0.x...)

    To test it out :

    Register a server with ULN and add the following channel to the server list : ol6_x86_64_Dtrace_BETA.

    When you have that channel registered with your server, you can install the following required RPMs :

    dtrace-modules-2.6.39-101.0.1.el6uek
    dtrace-utils
    kernel-uek-headers-2.6.39-101.0.1.el6uek.x86_64
    kernel-uek-devel-2.6.39-101.0.1.el6uek.x86_64
    kernel-uek-2.6.39-101.0.1.el6uek.x86_64
    kernel-uek-firmware-2.6.39-101.0.1.el6uek.noarch
    

    Once the RPMs are installed, reboot the server into the correct kernel : 2.6.39-101.0.1.el6uek.

    The DTrace modules are installed in /lib/modules/2.6.39-101.0.1.el6uek.x86_64/kernel/drivers/dtrace.

    # cd /lib/modules/2.6.39-101.0.1.el6uek.x86_64/kernel/drivers/dtrace
    # ls
    
    dtrace.ko  dt_test.ko  profile.ko  sdt.ko  systrace.ko
    
    Load the DTrace modules into the running kernel:
    # modprobe dtrace 
    # modprobe profile
    # modprobe sdt
    # modprobe systrace
    # modprobe dt_test
    

    The DTrace compiler is in /usr/sbin/dtrace.There are a few README files in : /usr/share/doc/dtrace-0.2.4.
    These explain what's there and what's not yet there...

    New features:

    - The SDT provider is implemented, providing in-kernel static probes. Some of the proc provider is implemented using this facility.

    Bugfixes:

    - Syscall tracing of stub-based syscalls (such as fork, clone, exit, and sigreturn) now works.
    - Invalid memory accesses inside D scripts no longer cause oopses or panics.
    - Memory exhaustion inside D scripts no longer emits spurious oopses.
    - Several crash fixes.
    - Fixes to arithmetic inside aggregations, fixing quantize().
    - Improvements to the installed headers.

    We are also getting pretty good coverage on both userspace and kernel in terms of the DTrace testsuites.
    Thanks to the team working on it!

    Below are a few examples with output and source code for the .d scripts.

    
    activity.d    - this shows ongoing activity in terms of what program was executing, 
              what its parent is, and how long it ran. This makes use of the proc SDT provider.
    pstrace.d     - this is similar but instead of providing timing, it lists ancestory
              of a process, based on whatever history is collected during the DTrace runtime 
              of this script.  This makes use of the proc SDT provider.
    rdbufsize.d   - this shows quantised results for buffer sizes used in read syscalls, 
              i.e. it gives a statistical breakdown of sizes passed in the read() syscall, 
              which can be useful to see what buffer sizes are commonly used.
    
    =====================
    activity.d
    =====================
    
    #pragma D option quiet
    
    proc:::create
    {
        this->pid = *((int *)arg0 + 171);
    
        time[this->pid] = timestamp;
        p_pid[this->pid] = pid;
        p_name[this->pid] = execname;
        p_exec[this->pid] = "";
    }
    
    proc:::exec
    /p_pid[pid]/
    {
        p_exec[pid] = stringof(arg0);
    }
    
    proc:::exit
    /p_pid[pid]&&  p_exec[pid] != ""/
    {
        printf("%d: %s (%d) executed %s (%d) for %d msecs\n",
               timestamp, p_name[pid], p_pid[pid], p_exec[pid], pid,
               (timestamp - time[pid]) / 1000);
    }
    
    proc:::exit
    /p_pid[pid]&&  p_exec[pid] == ""/
    {
        printf("%d: %s (%d) forked itself (as %d) for %d msecs\n",
               timestamp, p_name[pid], p_pid[pid], pid,
               (timestamp - time[pid]) / 1000);
    
    }
    
    
    =============
    pstrace.d
    =============
    
    
    #pragma D option quiet
    
    proc:::create
    {
        this->pid = *((int *)arg0 + 171);
    
        p_pid[this->pid] = pid;
        p_name[this->pid] = execname;
        p_exec[this->pid] = "";
        path[this->pid] = strjoin(execname, " ->  ");
    }
    
    proc:::create
    /p_pid[pid]/
    {
        this->pid = *((int *)arg0 + 171);
    
        path[this->pid] = strjoin(path[pid], " ->  ");
    }
    
    proc:::exec
    /p_pid[pid]/
    {
        this->path = basename(stringof(arg0));
    
        path[pid] = strjoin(p_name[pid], strjoin(" ->  ", this->path));
        p_exec[pid] = this->path;
    }
    
    proc:::exit
    /p_pid[pid]&&  p_exec[pid] != ""/
    {
        printf("%d: %s[%d] ->  %s[%d]\n",
               timestamp, p_name[pid], p_pid[pid], p_exec[pid], pid);
    
        p_name[pid] = 0;
        p_pid[pid] = 0;
        p_exec[pid] = 0;
        path[pid] = 0;
    }
    
    proc:::exit
    /p_pid[pid]&&  p_exec[pid] == ""/
    {
        printf("%d: %s[%d] ->  [%d]\n",
               timestamp, p_name[pid], p_pid[pid], pid);
    
        p_name[pid] = 0;
        p_pid[pid] = 0;
        p_exec[pid] = 0;
        path[pid] = 0;
    }
    
    proc:::create
    /path[pid] != ""/
    {
        this->pid = *((int *)arg0 + 171);
    
        p_name[this->pid] = path[pid];
    }
    
    
    ==================
    rdbufsize.d
    ==================
    
    syscall::read:entry
    {
        @["read"] = quantize(arg2);
    }
    
    
    Since we do not yet have CTF support, the scripts do raw memory access to get to the pid field in the task_struct.
    this->pid = *((int *)arg0 + 171);
    Where arg0 is a pointer to struct task_struct (arg0 passed to the proc:::create probe when a new task/thread/process is created).

    Just cut'n paste these scripts into text files and run them, I have some sample output below :
    
    activity.d (here I just run some commands in a separate shell which then shows in the output)
    dtrace -s activity.d 
      2134889238792594: automount (1736) forked itself (as 11484) for 292 msecs
    2134912932312379: bash (11488) forked itself (as 11489) for 1632 msecs
    2134912934171504: bash (11488) forked itself (as 11491) for 1319 msecs
    2134912937531743: bash (11488) forked itself (as 11493) for 2150 msecs
    2134912939231853: bash (11488) forked itself (as 11496) for 1366 msecs
    2134912945152337: bash (11488) forked itself (as 11499) for 1135 msecs
    2134912948946944: bash (11488) forked itself (as 11503) for 1285 msecs
    2134912923230099: sshd (11485) forked itself (as 11486) for 8790195 msecs
    2134912932092719: bash (11489) executed /usr/bin/id (11490) for 1005 msecs
    2134912945773882: bash (11488) forked itself (as 11501) for 328 msecs
    2134912937325453: bash (11493) executed /usr/bin/tput (11495) for 721 msecs
    2134912941951947: bash (11488) executed /bin/grep (11498) for 1418 msecs
    2134912933963262: bash (11491) executed /bin/hostname (11492) for 804 msecs
    2134912936358611: bash (11493) executed /usr/bin/tty (11494) for 626 msecs
    2134912939035204: bash (11496) executed /usr/bin/dircolors (11497) for 789 msecs
    2134912944986994: bash (11499) executed /bin/uname (11500) for 621 msecs
    2134912946568141: bash (11488) executed /bin/grep (11502) for 1003 msecs
    2134912948757031: bash (11503) executed /usr/bin/id (11504) for 796 msecs
    2134913874947141: ksmtuned (1867) forked itself (as 11505) for 2189 msecs
    2134913883976223: ksmtuned (11507) executed /bin/awk (11509) for 8056 msecs
    2134913883854384: ksmtuned (11507) executed /bin/ps (11508) for 8122 msecs
    2134913884227577: ksmtuned (1867) forked itself (as 11507) for 9025 msecs
    2134913874664300: ksmtuned (11505) executed /bin/awk (11506) for 1307 msecs
    2134919238874188: automount (1736) forked itself (as 11511) for 263 msecs
    2134920459512267: bash (11488) executed /bin/ls (11512) for 1682 msecs
    2134930786318884: bash (11488) executed /bin/ps (11513) for 7241 msecs
    2134933581336279: bash (11488) executed /bin/find (11514) for 161853 msecs
    
    
    pstrace.d (as daemons or shells/users execute binaries, they show up automatically)
    # dtrace -s pstrace.d 
    2134960378397662: bash[11488] ->  ps[11517]
    2134962360623937: bash[11488] ->  ls[11518]
    2134964238953132: automount[1736] ->  [11519]
    2134965712514625: bash[11488] ->  df[11520]
    2134971432047109: bash[11488] ->  top[11521]
    2134973888279789: ksmtuned[1867] ->  [11522]
    2134973897131858: ksmtuned ->  [11524] ->  awk[11526]
    2134973896999204: ksmtuned ->  [11524] ->  ps[11525]
    2134973897400622: ksmtuned[1867] ->  [11524]
    2134973888019910: ksmtuned ->  [11522] ->  awk[11523]
    2134981995742661: sshd ->  sshd ->  bash[11531] ->  [11532]
    2134981997448161: sshd ->  sshd ->  bash[11531] ->  [11534]
    2134982000599413: sshd ->  sshd ->  bash[11531] ->  [11536]
    2134982002035206: sshd ->  sshd ->  bash[11531] ->  [11539]
    2134982007815639: sshd ->  sshd ->  bash[11531] ->  [11542]
    2134982011627125: sshd ->  sshd ->  bash[11531] ->  [11546]
    2134981989026168: sshd ->  sshd[11529] ->  [11530]
    2134982008472173: sshd ->  sshd ->  bash[11531] ->  [11544]
    2134981995518210: sshd ->  sshd ->  bash ->  [11532] ->  id[11533]
    2134982000393612: sshd ->  sshd ->  bash ->  [11536] ->  tput[11538]
    2134982004531164: sshd ->  sshd ->  bash[11531] ->  grep[11541]
    2134981997256114: sshd ->  sshd ->  bash ->  [11534] ->  hostname[11535]
    2134981999476476: sshd ->  sshd ->  bash ->  [11536] ->  tty[11537]
    2134982001865119: sshd ->  sshd ->  bash ->  [11539] ->  dircolors[11540]
    2134982007610268: sshd ->  sshd ->  bash ->  [11542] ->  uname[11543]
    2134982009271769: sshd ->  sshd ->  bash[11531] ->  grep[11545]
    2134982011408808: sshd ->  sshd ->  bash ->  [11546] ->  id[11547]
    
    
    rdbufsize.d (in another shell I just did some random read operations and this
    shows a summary)
    # dtrace -s rdbufsize.d 
    dtrace: script 'rdbufsize.d' matched 1 probe
    ^C
    
      read                                              
               value  ------------- Distribution ------------- count    
                  -1 |                                         0        
                   0 |                                         8        
                   1 |                                         59       
                   2 |                                         209      
                   4 |                                         72       
                   8 |                                         488      
                  16 |                                         67       
                  32 |                                         1074     
                  64 |                                         113      
                 128 |                                         88       
                 256 |                                         384      
                 512 |@@@                                      6582     
                1024 |@@@@@@@@@@@@@@@@@@                       44787    
                2048 |@                                        2419     
                4096 |@@@@@@@                                  16239    
                8192 |@@@@                                     10395    
               16384 |@@@@@@                                   14784    
               32768 |                                         427      
               65536 |                                         669      
              131072 |                                         143      
              262144 |                                         43       
              524288 |                                         46       
             1048576 |                                         92       
             2097152 |                                         196      
             4194304 |                                         0   
    
    

    Saturday Feb 04, 2012

    Changing database repositories in Oracle VM 3

    At home I have a small atom-based server that was running Oracle VM Manager 3, installed using simple installation. Simple installation is the option where you just enter a password and the Oracle VM Manager installer installs : Oracle XE database, WebLogic Server and the Oracle VM Manager container. The same password is used for the database user, Oracle VM Manager database schema user, weblogic user and admin user for the manager instance.

    The manager instance stores its data as objects inside the database. To do that, there is something called a datasource defined in weblogic during installation. It's basically a jdbc connection from weblogic to the database. This DS requires the following information : database hostname, database instance name, database listener port number, schema username and schema password. In my default install this was localhost, XE, 1521, ovs, mypassword.

    Now that I re-organized my machines a bit, I have a larger server that runs a normal database 11.2.0.3, which I also happen to use for EM12c. So I figured I would take some load off the little atom server, keep it running Oracle VM Manager but shut down XE and move the schema over to my dedicated database host. This is a straightforward process so I just wanted to list the steps.

    1) shut down Oracle VM Manager so that it does not continue updating the repository.
    as root : /etc/init.d/ovmm stop
    
    2) export the schema user using the exp command for Oracle XE
    as oracle : 
    cd /u01/app/oracle/product/11.2.0/xe
    export ORACLE_HOME=`pwd`
    export ORACLE_SID=XE
    export PATH=$ORACLE_HOME/bin:$PATH
    exp
    (enter user ovs and its password)
    export user (option 2)
    export everything including data
    this will create (by default) a file called expdat.dmp
    copy this file over to the other server with the other database
    The schema name is also in /u01/app/oracle/ovm-manager-3/.config (OVSSCHEMA)
    
    3) shutdown oracle-xe as it's no longer needed  
    as root : /etc/init.d/oracle-xe stop
    
    4) import the ovs user into the new database. I like to do it as the user. 
    I just simply pre-create the schema before starting import
    as oracle : 
    sqlplus '/ as sysdba'
    create user ovs identified by MyPassword;
    grant connect,resource to ovs;
    at this point, run the imp utility on the box to import the expdat.dmp
    import asks for username/password, enter ovs and its password
    import yes on all data and tables and content.
    
    At this point you have a good complete repository. 
    Now let's make the Oracle VM Manager weblogic instance point to the new database.
    
    5) on the original system, restart weblogic
    as root :/etc/init.d/ovmm start
    wait a few minutes for the instance to come online
    
    6) use the ovm_admin tool
    as oracle : 
    cd /u01/app/oracle/ovm-manager-3/bin
    ./ovm_admin --modifyds orcl wopr8 1521 ovs mypassword
    My new host name for the 11.2.0.3 database is called wopr, 
    the database instance is orcl and listener is still 1521 with schema ovs
    The admin tool asks for a password, this is the weblogic user password. 
    In a simple install, this would be the same as your admin or ovs account password.
    
    7) restart to have everything take effect.
    as root : 
    /etc/init.d/ovmm stop  ; sleep 5 ;/etc/init.d/ovmm start ;
    
    8) edit the config file and update the new data 
    vi /u01/app/oracle/ovm-manager-3/.config 
    modify :
    DBHOST=
    SID=
    LSNR=
    OVSSCHEMA=
    and leave the rest as is. 
    
    that should do it !
    
    

    Saturday Jan 28, 2012

    The latest bits around ocfs2

    It's been a while since we last posted something about ocfs2 on our Oracle blogs but that doesn't mean the filesystem hasn't evolved. I had to write a bit of a summary for a customer so I figured it would be a good idea to just add a blog entry and document it here as well.

    OCFS2 is a native Linux cluster filesystem that has been around for quite a few years now and was developed at Oracle and also has had a ton of contributions from the folks at SuSE over several years. The filesystem got officially merged into 2.6.16 and all the changes since have been going into mainline first and then trickled down into versions we build for Linux distributions. So we have ocfs2 versions 1.2, 1.4, 1.6 (1.8 for Oracle VM 3) which are specific snapshots of the filesystem code and then released for specific kernels like 2.6.18 or 2.6.32.

    SLES has a version of ocfs2 that they build, other vendors decided to not compile in the filesystem so for Oracle Linux, we of course make sure we have current versions available as well. We also provide support for the filesystem as part of Oracle Linux support. You do not need to buy extra clustering or filesystem add-on options, the code is part of Oracle Linux and the support is part of our regular Oracle Linux support subscriptions.

    Many of the ocfs2 users, use the filesystem as an alternative to nfs, when I read the articles on the ocfs2 public maillists, this is a comment that comes back frequently. So there must be some truth to it as these are all unsolicited 3rd party comments :)... One nice thing with ocfs2 is that it's so very easy to set up. Just a simple text file (config file) on each node with the list of hostnames, ip addresses and you're basically good to go. One does need shared storage as it's a real cluster filesystem. This shared storage can be iscsi, san/fc or shared scsi and we highly recommend a private network so that you can isolate the cluster traffic. The main problem reports we get tend to be due to overloading servers. In a cluster filesystem you have to ensure that you know really what is going on with all servers, otherwise there is the potential for data corruption. This means that if a node gets in trouble, overloaded network or running out of memory, it will likely end up halting or rebooting the node so that the other servers can happily continue. A large percentage of customer reports tend to be related to misconfigured networks (share the interconnect / cluster traffic with everything else) or bad/slow disk subsystems that get overloaded and the heartbeat IOs cannot make it to the device.

    One of the reasons ocfs2 is so trivial to configure, is that the entire ecosystem is integrated. It comes with its own embedded clustering stack, o2cb. This mini, specialized clusterstack provides node membership, heartbeat services and a distributed lock manager. This stack is not designed to be a general purpose userspace clusterstack but really tailored towards the basic requirements for our filesystem. Another really cool feature, I'd say it's in my top 3 cool features list for ocfs2, is dlmfs. dlmfs is a virtual filesystem that exposes a few simple locktypes : shared read, exclusive and trylock. There's a libo2dlm to use this in applications or you can simply use your shell to create a domain and locks just by doing mkdir and touch of files. Someone with some time on their hands could theoretically with some shell magic hack together a little userspace cluster daemon that could monitor applications or nodes and handle start, stop, restart. It's on my todo list but I haven't had time :) anyway, it's a very nifty feature.

    Anyway, I digress... so one of the customer questions I had recently was about what's going on with ocfs2 and has there been any development effort. I decided to go look at the linux kernels since 2.6.27 and collect the list of checkins that have happened since. These features are also in our latest ocfs2 as part of Oracle VM 3.0 and for the most part also in our kernel (Unbreakable Enterprise Kernel). Here is the list, I think it's pretty impressive :

    
        Remove JBD compatibility layer 
        Add POSIX ACLs
        Add security xattr support (extended attributes for SELinux)
        Implement quota recovery 
        Periodic quota syncing 
        Implementation of local and global quota file handling
        Enable quota accounting on mount, disable on umount 
        Add a name indexed b-tree to directory inodes 
        Optimize inode allocation by remembering last group
        Optimize inode group allocation by recording last used group.
        Expose the file system state via debugfs 
        Add statistics for the checksum and ecc operations.
        Add CoW support. (reflink is unlimited inode-based (file based) writeable snapshots - very very useful for virtualization)
        Add ioctl for reflink. 
        Enable refcount tree support. 
        Always include ACL support 
        Implement allocation reservations, which reduces fragmentation significantly 
        Optimize punching-hole code, speeds up significantly some rare operations 
        Discontiguous block groups, necessary to improve some kind of allocations. It is a feature that marks an incompatible bit, ie, it makes a forward-compatible change
        Make nointr ("don't allow file operations to be interrupted") a default mount option 
        Allow huge (> 16 TiB) volumes to mount   (support for huge volumes)
        Add a mount option "coherency=*" to handle cluster coherency for O_DIRECT writes.
        Add new OCFS2_IOC_INFO ioctl: offers the none-privileged end-user a possibility to get filesys info gathering 
        Add support for heartbeat=global mount option (instead of having a heartbeat per filesystem you can now have a single heartbeat)
        SSD trimming support 
        Support for moving extents (preparation for defragmentation)
    
    

    There are a number of external articles written about ocfs2, one that I found is here.

    Have fun...

    Monday Jan 16, 2012

    Using kexec for fast reboots on Oracle Linux with UEK.

    A feature that's not often talked about in Linux is kexec. kexec is part of an infrastructure that allows the Linux kernel to load a kernel directly. Basically jump right into executing the new kernel immediately instead of going to a standard reset -> system power-on -> bios/firmware initialize -> memory/device discovery -> bootloader -> linux kernel.

    kexec's mechanism is most commonly used with kdump. Basically with kdump, when a crash or panic occurs, a new kernel is booted after the crash while the memory is preserved from the previous kernel's runtime. The new kernel can then capture this data and generate the dump which then can go to local disk, remote disk or anywhere else for that matter. In order to use kdump, you basically have to allocate/reserve memory for this dump kernel. This is done by adding crashkernel=xxx@yyy to the grub command line when booting. The crash kernel image is then loaded and will be executed when a crash or panic occurs. Even though kdump is a bit cumbersome to set up, it allows for really great flexibility and is very powerful in helping with debugging issues.

    For those interested in kdump, there's a good blog out there test kdump on Oracle Linux.
    Or for those that just want to read the documentation that's part of the Linux kernel tree : kdump.

    Anyway, this entry is not about kdump. kdump is great but I wanted to talk about the use of kexec proper and how it can help with doing fast reboots of your systems. Both Oracle Linux 5 and Oracle Linux 6 have support for reboot to use kexec as the reboot mechanism (see /etc/init.d/halt for details). When a standard reboot command is executed, init goes to 6 and /etc/init.d/halt gets run. This script, when it sees that kexec has been configured with a kernel image, will just execute kexec -e. In a standard reboot (not reboot -f) the normal shutdown scripts get executed and at the end where the system normally does a reset.
    This reset then makes the system hard reset, jump into the bios, does a memory test, finds devices, initialize the devices and firmware, boot the bootdevice bootloader, start the kernel.

    To set up kexec you should run the following command shortly after you boot the system. If you want to automate this, it makes sense to add this to your rc scripts. We will look at integrating this more into the OS management scripts for Oracle Linux to make it easier for the system administrators.

    kexec -l --append="`cat /proc/cmdline`" 
        --initrd=/boot/initrd-`uname -r`.img /boot/vmlinuz-`uname -r`
    In my case I am running 2.6.32-200.13.1.el5uek.

    Once this is done, the new kernel image is prepared, memory is allocated and you now can do one of 2 things :

    - run reboot : halt at the tail end of a normal reboot (shutdown all services) will execute directly into this new kernel image, exactly the same way as you booted the OS to get to this point.

    - you wish to do a very fast reboot without shutdown (reboot -f). then you do

    sync; umount -a ; kexec -e

    In this case, you bypass all the service shutdown scripts and instantly jump start the new kernel, this is by far the fastest way to restart your box.

    The total amount of time saved is highly dependent on your server. Basically time a system startup all the way to grub executing the kernel image, that's the amount of time you will save on a subsequent reboot. This can range from a number of seconds (15,20) to, sometimes, several minutes.

    One caveat with the use of kexec and instant restarts without going through device resets is that, in some cases, the devices might act badly or the driver might not be doing the right thing. Before you really use this on your system, test it first to ensure that the drivers for the hardware you have and the devices themselves are doing the right thing (tm).

    You can find more info about kexec in this article written a number of years ago : kexec article.

    I am planning on writing a entry next about how to use this with Oracle VM Server 3.

    Tuesday Oct 18, 2011

    get the rpms

    A few days ago I wrote this blog entry. It was a little example on how to use a container on Oracle Linux with our 2.6.39 kernel.

    We just pushed the RPMs to both public-yum and ULN. So you can subscribe to the UEK2 beta channel or configure your yum repository for public-yum and get the packages.

    To try out what I did in my blog, make sure you install the 2.6.39 uek2 kernel, get the latest btrfs-progs and get the lxc tools.

    Sunday Oct 16, 2011

    Containers on Linux

    At Oracle OpenWorld we talked about Linux Containers. Here is an example of getting a Linux container going with Oracle Linux 6.1, UEK2 beta and btrfs. This is just an example, not released, production, bug-free... for those that don't read README files ;-)

    This container example is using the existing Linux cgroups features in the mainline kernel (and also in UEK, UEK2) and lxc tools to create the environments.

    Example assumptions :
    - Host OS is Oracle Linux 6.1 with UEK2 beta.
    - using btrfs filesystem for containers (to make use of snapshot capabilities)
    - mounting the fs in /container
    - use Oracle VM templates as a base environment
    - Oracle Linux 5 containers

    I have a second disk on my test machine (/dev/sdb) which I will use for this exercise.

    # mkfs.btrfs  -L container  /dev/sdb
    
    # mount
    /dev/mapper/vg_wcoekaersrv4-lv_root on / type ext4 (rw)
    proc on /proc type proc (rw)
    sysfs on /sys type sysfs (rw)
    devpts on /dev/pts type devpts (rw,gid=5,mode=620)
    tmpfs on /dev/shm type tmpfs (rw)
    /dev/sda1 on /boot type ext4 (rw)
    /dev/mapper/vg_wcoekaersrv4-lv_home on /home type ext4 (rw)
    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
    sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
    /dev/mapper/loop0p2 on /mnt type ext3 (rw)
    /dev/mapper/loop1p2 on /mnt2 type ext3 (rw)
    /dev/sdb on /container type btrfs (rw)
    

    lxc tools installed...

    # rpm -qa|grep lxc
    lxc-libs-0.7.5-2.x86_64
    lxc-0.7.5-2.x86_64
    lxc-devel-0.7.5-2.x86_64
    

    lxc tools come with template config files :

    # ls /usr/lib64/lxc/templates/
    lxc-altlinux lxc-busybox lxc-debian lxc-fedora lxc-lenny lxc-ol4 lxc-ol5 lxc-opensuse lxc-sshd lxc-ubuntu
    I created one for Oracle Linux 5 : lxc-ol5.

    Download Oracle VM template for OL5 from http://edelivery.oracle.com/linux. I used OVM_EL5U5_X86_PVM_10GB.
    We want to be able to create 1 environment that can be used in both container and VM mode to avoid duplicate effort.

    Untar the VM template.

    # tar zxvf OVM_EL5U5_X86_PVM_10GB.tar.gz
    
    These are the steps needed (to be automated in the future)...
    Copy the content of the VM virtual disk's root filesystem into a btrfs subvolume in order to easily clone the base template.

    My template configure script defines :
    template_path=/container/ol5-template

    - create subvolume ol5-template on /containers

    # btrfs subvolume create /container/ol5-template
    Create subvolume '/container/ol5-template'
    
    - loopback mount the Oracle VM template System image / partition
    # kpartx -a System.img 
    # kpartx -l System.img 
    loop0p1 : 0 192717 /dev/loop0 63
    loop0p2 : 0 21607425 /dev/loop0 192780
    loop0p3 : 0 4209030 /dev/loop0 21800205
    
    I need to mount the 2nd partition of the virtual disk image, kpartx will set up loopback devices for each of the virtual disk partitions. So let's mount loop0p2 which will contain the Oracle Linux 5 / filesystem of the template.
    # mount /dev/mapper/loop0p2 /mnt
    
    # ls /mnt
    bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
    root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
    
    Great, now we have the entire template / filesystem available. Let's copy this into our subvolume. This subvolume will then become the basis for all OL5 containers.
    # cd /mnt
    # tar cvf - * | ( cd /container/ol5-template ; tar xvf ; )
    
    In the near future we will put some automation around the above steps.
    # pwd
    /container/ol5-template
    
    # ls
    bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
    root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
    
    From this point on, the lxc-create script, using the template config as an argument, should be able to automatically create a snapshot and set up the filesystem correctly.
    # lxc-create -n ol5test1 -t ol5
    
    Cloning base template /container/ol5-template to /container/ol5test1 ...
    Create a snapshot of '/container/ol5-template' in '/container/ol5test1'
    Container created : /container/ol5test1 ...
    Container template source : /container/ol5-template
    Container config : /etc/lxc/ol5test1
    Network : eth0 (veth) on virbr0
    'ol5' template installed
    'ol5test1' created
    
    # ls /etc/lxc/ol5test1/
    config  fstab
    
    # ls /container/ol5test1/
    bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
    root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
    
    Now that it's created and configured, we should be able to just simply start it :
    # lxc-start -n ol5test1
    INIT: version 2.86 booting
                    Welcome to Enterprise Linux Server
                    Press 'I' to enter interactive startup.
    Setting clock  (utc): Sun Oct 16 06:08:27 EDT 2011         [  OK  ]
    Loading default keymap (us):                               [  OK  ]
    Setting hostname ol5test1:                                 [  OK  ]
    raidautorun: unable to autocreate /dev/md0
    Checking filesystems
                                                               [  OK  ]
    mount: can't find / in /etc/fstab or /etc/mtab
    Mounting local filesystems:                                [  OK  ]
    Enabling local filesystem quotas:                          [  OK  ]
    Enabling /etc/fstab swaps:                                 [  OK  ]
    INIT: Entering runlevel: 3
    Entering non-interactive startup
    Starting sysstat:  Calling the system activity data collector (sadc): 
                                                               [  OK  ]
    Starting background readahead:                             [  OK  ]
    Flushing firewall rules:                                   [  OK  ]
    Setting chains to policy ACCEPT: nat mangle filter         [  OK  ]
    Applying iptables firewall rules:                          [  OK  ]
    Loading additional iptables modules: no                    [FAILED]
    Bringing up loopback interface:                            [  OK  ]
    Bringing up interface eth0:  
    Determining IP information for eth0... done.
                                                               [  OK  ]
    Starting system logger:                                    [  OK  ]
    Starting kernel logger:                                    [  OK  ]
    Enabling ondemand cpu frequency scaling:                   [  OK  ]
    Starting irqbalance:                                       [  OK  ]
    Starting portmap:                                          [  OK  ]
    FATAL: Could not load /lib/modules/2.6.39-100.0.12.el6uek.x86_64/modules.dep: No such file or directory
    Starting NFS statd:                                        [  OK  ]
    Starting RPC idmapd: Error: RPC MTAB does not exist.
    Starting system message bus:                               [  OK  ]
    Starting o2cb:                                             [  OK  ]
    Can't open RFCOMM control socket: Address family not supported by protocol
    
    Mounting other filesystems:                                [  OK  ]
    Starting PC/SC smart card daemon (pcscd):                  [  OK  ]
    Starting HAL daemon:                                       [FAILED]
    Starting hpiod:                                            [  OK  ]
    Starting hpssd:                                            [  OK  ]
    Starting sshd:                                             [  OK  ]
    Starting cups:                                             [  OK  ]
    Starting xinetd:                                           [  OK  ]
    Starting crond:                                            [  OK  ]
    Starting xfs:                                              [  OK  ]
    Starting anacron:                                          [  OK  ]
    Starting atd:                                              [  OK  ]
    Starting yum-updatesd:                                     [  OK  ]
    Starting Avahi daemon...                                   [FAILED]
    Starting oraclevm-template...
    Regenerating SSH host keys.
    Stopping sshd:                                             [  OK  ]
    Generating SSH1 RSA host key:                              [  OK  ]
    Generating SSH2 RSA host key:                              [  OK  ]
    Generating SSH2 DSA host key:                              [  OK  ]
    Starting sshd:                                             [  OK  ]
    Regenerating up2date uuid.
    Setting Oracle validated configuration parameters.
    
    Configuring network interface.
      Network device: eth0
      Hardware address: 52:19:C0:EF:78:C4
    
    Do you want to enable dynamic IP configuration (DHCP) (Y|n)? 
    
    ... 
    
    This will run the well-known Oracle VM template configure scripts and set up the container the same way as it would an Oracle VM guest.

    The session that runs lxc-start is the local console. It is best to run this session inside screen so you can disconnect and reconnect.

    At this point,I can use lxc-console to log into the local console of the container, or, since the container has its internal network up and running and sshd is running, I can also just ssh into the guest.
    # lxc-console -n ol5test1 -t 1
    
    Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
    Kernel 2.6.39-100.0.12.el6uek.x86_64 on an x86_64
    
    host login: 
    
    I can simple get out of the console entering ctrl-a q.

    From inside the container :
    # mount
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw)
    devpts on /dev/pts type devpts (rw)
    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
    
    # /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 52:19:C0:EF:78:C4  
              inet addr:192.168.122.225  Bcast:192.168.122.255  Mask:255.255.255.0
              inet6 addr: fe80::5019:c0ff:feef:78c4/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:141 errors:0 dropped:0 overruns:0 frame:0
              TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:8861 (8.6 KiB)  TX bytes:2476 (2.4 KiB)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:8 errors:0 dropped:0 overruns:0 frame:0
              TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)
    
    # ps aux
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root         1  0.0  0.0   2124   656 ?        Ss   06:08   0:00 init [3]  
    root       397  0.0  0.0   1780   596 ?        Ss   06:08   0:00 syslogd -m 0
    root       400  0.0  0.0   1732   376 ?        Ss   06:08   0:00 klogd -x
    root       434  0.0  0.0   2524   368 ?        Ss   06:08   0:00 irqbalance
    rpc        445  0.0  0.0   1868   516 ?        Ss   06:08   0:00 portmap
    root       469  0.0  0.0   1920   740 ?        Ss   06:08   0:00 rpc.statd
    dbus       509  0.0  0.0   2800   576 ?        Ss   06:08   0:00 dbus-daemon --system
    root       578  0.0  0.0  10868  1248 ?        Ssl  06:08   0:00 pcscd
    root       610  0.0  0.0   5196   712 ?        Ss   06:08   0:00 ./hpiod
    root       615  0.0  0.0  13520  4748 ?        S    06:08   0:00 python ./hpssd.py
    root       637  0.0  0.0  10168  2272 ?        Ss   06:08   0:00 cupsd
    root       651  0.0  0.0   2780   812 ?        Ss   06:08   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
    root       660  0.0  0.0   5296  1096 ?        Ss   06:08   0:00 crond
    root       745  0.0  0.0   1728   580 ?        SNs  06:08   0:00 anacron -s
    root       753  0.0  0.0   2320   340 ?        Ss   06:08   0:00 /usr/sbin/atd
    root       817  0.0  0.0  25580 10136 ?        SN   06:08   0:00 /usr/bin/python -tt /usr/sbin/yum-updatesd
    root       819  0.0  0.0   2616  1072 ?        SN   06:08   0:00 /usr/libexec/gam_server
    root       830  0.0  0.0   7116  1036 ?        Ss   06:08   0:00 /usr/sbin/sshd
    root      2998  0.0  0.0   2368   424 ?        Ss   06:08   0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0.leases -pf /var/run/dhc
    root      3102  0.0  0.0   5008  1376 ?        Ss   06:09   0:00 login -- root     
    root      3103  0.0  0.0   1716   444 tty2     Ss+  06:09   0:00 /sbin/mingetty tty2
    root      3104  0.0  0.0   1716   448 tty3     Ss+  06:09   0:00 /sbin/mingetty tty3
    root      3105  0.0  0.0   1716   448 tty4     Ss+  06:09   0:00 /sbin/mingetty tty4
    root      3138  0.0  0.0   4584  1436 tty1     Ss   06:11   0:00 -bash
    root      3167  0.0  0.0   4308   936 tty1     R+   06:12   0:00 ps aux
    
    From the host :
    # lxc-info -n ol5test1
    state:   RUNNING
    pid:     16539
    
    # lxc-kill -n ol5test1
    
    # lxc-monitor -n ol5test1
    'ol5test1' changed state to [STOPPING]
    'ol5test1' changed state to [STOPPED]
    
    So creating more containers is trivial. Just keep running lxc-create.
    # lxc-create -n ol5test2 -t ol5
    
    # btrfs subvolume list /container
    ID 297 top level 5 path ol5-template
    ID 299 top level 5 path ol5test1
    ID 300 top level 5 path ol5test2
    
    lxc-tools will be uploaded to the uek2 beta channel to start playing with this.

    Oracle Linux 4 example

    Here is the same principle for Oracle Linux 4. Using the template create script lxc-ol4. I started out using the OVM_EL4U7_X86_PVM_4GB template and followed the same steps.

    # kpartx -a System.img 
    
    # kpartx -l System.img 
    loop0p1 : 0 64197 /dev/loop0 63
    loop0p2 : 0 8530515 /dev/loop0 64260
    loop0p3 : 0 4176900 /dev/loop0 8594775
    
    # mount /dev/mapper/loop0p2 /mnt
    
    # cd /mnt
    
    # btrfs subvolume create /container/ol4-template
    Create subvolume '/container/ol4-template'
    
    # tar cvf - * | ( cd /container/ol4-template ; tar xvf - ; )
    
    # lxc-create -n ol4test1 -t ol4
    
    Cloning base template /container/ol4-template to /container/ol4test1 ...
    Create a snapshot of '/container/ol4-template' in '/container/ol4test1'
    Container created : /container/ol4test1 ...
    Container template source : /container/ol4-template
    Container config : /etc/lxc/ol4test1
    Network : eth0 (veth) on virbr0
    'ol4' template installed
    'ol4test1' created
    
    # lxc-start -n ol4test1
    INIT: version 2.85 booting
    /etc/rc.d/rc.sysinit: line 80: /dev/tty5: Operation not permitted
    /etc/rc.d/rc.sysinit: line 80: /dev/tty6: Operation not permitted
    Setting default font (latarcyrheb-sun16):                  [  OK  ]
    
                    Welcome to Enterprise Linux
                    Press 'I' to enter interactive startup.
    Setting clock  (utc): Sun Oct 16 09:34:56 EDT 2011         [  OK  ]
    Initializing hardware...  storage network audio done       [  OK  ]
    raidautorun: unable to autocreate /dev/md0
    Configuring kernel parameters:  error: permission denied on key 'net.core.rmem_default'
    error: permission denied on key 'net.core.rmem_max'
    error: permission denied on key 'net.core.wmem_default'
    error: permission denied on key 'net.core.wmem_max'
    net.ipv4.ip_forward = 0
    net.ipv4.conf.default.rp_filter = 1
    net.ipv4.conf.default.accept_source_route = 0
    kernel.core_uses_pid = 1
    fs.file-max = 327679
    kernel.msgmni = 2878
    kernel.msgmax = 8192
    kernel.msgmnb = 65536
    kernel.sem = 250 32000 100 142
    kernel.shmmni = 4096
    kernel.shmall = 1073741824
    kernel.sysrq = 1
    fs.aio-max-nr = 3145728
    net.ipv4.ip_local_port_range = 1024 65000
    kernel.shmmax = 4398046511104
                                                               [FAILED]
    Loading default keymap (us):                               [  OK  ]
    Setting hostname ol4test1:                                 [  OK  ]
    Remounting root filesystem in read-write mode:             [  OK  ]
    mount: can't find / in /etc/fstab or /etc/mtab
    Mounting local filesystems:                                [  OK  ]
    Enabling local filesystem quotas:                          [  OK  ]
    Enabling swap space:                                       [  OK  ]
    INIT: Entering runlevel: 3
    Entering non-interactive startup
    Starting sysstat:                                          [  OK  ]
    Setting network parameters:  error: permission denied on key 'net.core.rmem_default'
    error: permission denied on key 'net.core.rmem_max'
    error: permission denied on key 'net.core.wmem_default'
    error: permission denied on key 'net.core.wmem_max'
    net.ipv4.ip_forward = 0
    net.ipv4.conf.default.rp_filter = 1
    net.ipv4.conf.default.accept_source_route = 0
    kernel.core_uses_pid = 1
    fs.file-max = 327679
    kernel.msgmni = 2878
    kernel.msgmax = 8192
    kernel.msgmnb = 65536
    kernel.sem = 250 32000 100 142
    kernel.shmmni = 4096
    kernel.shmall = 1073741824
    kernel.sysrq = 1
    fs.aio-max-nr = 3145728
    net.ipv4.ip_local_port_range = 1024 65000
    kernel.shmmax = 4398046511104
                                                               [FAILED]
    Bringing up loopback interface:                            [  OK  ]
    Bringing up interface eth0:                                [  OK  ]
    Starting system logger:                                    [  OK  ]
    Starting kernel logger:                                    [  OK  ]
    Starting portmap:                                          [  OK  ]
    Starting NFS statd:                                        [FAILED]
    Starting RPC idmapd: Error: RPC MTAB does not exist.
    Mounting other filesystems:                                [  OK  ]
    Starting lm_sensors:                                       [  OK  ]
    Starting cups:                                             [  OK  ]
    Generating SSH1 RSA host key:                              [  OK  ]
    Generating SSH2 RSA host key:                              [  OK  ]
    Generating SSH2 DSA host key:                              [  OK  ]
    Starting sshd:                                             [  OK  ]
    Starting xinetd:                                           [  OK  ]
    Starting crond:                                            [  OK  ]
    Starting xfs:                                              [  OK  ]
    Starting anacron:                                          [  OK  ]
    Starting atd:                                              [  OK  ]
    Starting system message bus:                               [  OK  ]
    Starting cups-config-daemon:                               [  OK  ]
    Starting HAL daemon:                                       [  OK  ]
    Starting oraclevm-template...
    Regenerating SSH host keys.
    Stopping sshd:                                             [  OK  ]
    Generating SSH1 RSA host key:                              [  OK  ]
    Generating SSH2 RSA host key:                              [  OK  ]
    Generating SSH2 DSA host key:                              [  OK  ]
    Starting sshd:                                             [  OK  ]
    Regenerating up2date uuid.
    Setting Oracle validated configuration parameters.
    
    Configuring network interface.
      Network device: eth0
      Hardware address: D2:EC:49:0D:7D:80
    
    Do you want to enable dynamic IP configuration (DHCP) (Y|n)? 
    ...
    ...
    
    # lxc-console -n ol4test1
    
    Enterprise Linux Enterprise Linux AS release 4 (October Update 7)
    Kernel 2.6.39-100.0.12.el6uek.x86_64 on an x86_64
    
    localhost login: 
    
    

    Wednesday Oct 12, 2011

    passing of dmr a.k.a Dennis Ritchie

    More sad news - Dennis Ritchie passed away. Not covered in the media like the passing of Steve Jobs (equally sad) but dmr definitely deserves the recognition of all the stuff he invented, coded, built and provided to the tech world.

    Rob Pike wrote a short entry here.

    respect.

    Monday Oct 10, 2011

    it's DTrace not Dtrace :)

    sorry, Bryan smacked me on the fingers because I misspelled DTrace :) so wherever I wrote Dtrace, do s/Dtrace/DTrace :) keeps Bryan happy :) Bryan we good now? ;) my bad for misspelling the name...

    Sunday Oct 09, 2011

    trying uek2

    Another Oracle Openworld revelation was the availability of version 2 of the Unbreakable Enterprise Kernel, in short uek2. Let me put together a few notes on this as well.

    UEK2 is based on Linux 2.6.39 with applied changesets (not backports) up to Linux 3.0.3. The change from 2.6 to 3.0, by Linus, was pretty much arbitrary this time around. There were no major changes in design that went into the kernel post 2.6.39. However, versions changes like that do have an effect on userspace programs, so in order to maintain our true compatibility and refrain from code changes, sticking to 2.6.39 was the best solution.

    UEK2 beta is freely available to everyone. All you need is Oracle Linux 6 installed (also freely available from edelivery) and connect to the Unbreakable Linux Network (ULN) if you are an existing customer or if you just want to play with cool new stuff, use our public yum repository.

    The source code for our git repo is also publicly available on oss.oracle.com. http://oss.oracle.com/git/?p=linux-2.6-unbreakable-beta.git;a=summary. Indeed, not just a tarball of junk but the entire changelog, all the checkins, all the history, nicely linked up with the original git repo from kernel.org.

    There is a huge list of changes in the mainline kernel since 2.6.32, too long to sum up here. Some of the things we discussed as new features are the following :

    - btrfs production.
    btrfs is an awesome filesystem which is going through rigorous testing in order to be able to announce it production quality with uek2. The feature list of the filesystem is great, you can read some of my earlier blog entries to get a feel of it. Use it for root filesystem and do snapshots while upgrading packages is definitely an interesting enhancement to deliver.The existing Oracle Linux 6 btrfs progs work just fine but the UEK2 channel will soon include an updated version of the btrfs userspace programs to give you full access to all features in uek2.

    - Linux containers
    This is not a port of Solaris Zones. Linux containers is based on lxc and all the cgroups work that has gone into the kernel over the last several years. All the kernel features required to handle container functionality is in uek2 and as with btrfs progs, we are going to update the beta channel with the lxc tools in userspace to enable easy creation of Oracle Linux containers. Oracle VM templates will be easily converted to running as a container as well as a VM in Oracle VM.

    - OpenVswitch
    Openvswitch is a very interesting virtual switch product that can help a lot in particular in a virtual world but is a good Linux bridge replacement. These packages will also be made available with uek2 in the beta channel. More information about openvswitch can be found here.

    - many new enhancements including even better performance than uek today.
    More details will be posted in upcoming blogs with detailed data and use cases to show how this will help running Oracle software (and non-Oracle software) better than on any other Linux distribution out there.

    Getting started from ULN :

    Register your Oracle Linux 6 server with ULN using uln_register. Click on your server and Manage subscriptions and add the ol6_x86_64_UEK_BETA channel to the list of channels for this server. Then on the server just run yum update kernel-uek, reboot and you are done.

    Getting started from the public-yum repository :

    # cd /etc/yum.repos.d
    # wget http://public-yum.oracle.com/beta/public-yum-ol6-beta.repo
    
    Edit the repo file and change enabled=0 to enabled=1
    # yum update kernel-uek
    Loaded plugins: refresh-packagekit, rhnplugin
    This system is not registered with ULN.
    ULN support will be disabled.
    uek2_beta                                                |  951 B     00:00
    uek2_beta/primary                                        | 339 kB     00:01
    uek2_beta                                                                   8/8
    Setting up Update Process
    Resolving Dependencies
    --> Running transaction check
    ---> Package kernel-uek.x86_64 0:2.6.39-100.0.12.el6uek set to be updated
    --> Processing Dependency: kernel-uek-firmware >= 2.6.39-100.0.12.el6uek for pac
    kage: kernel-uek-2.6.39-100.0.12.el6uek.x86_64
    --> Running transaction check
    ---> Package kernel-uek-firmware.noarch 0:2.6.39-100.0.12.el6uek set to be updat
    ed
    --> Finished Dependency Resolution
    
    Dependencies Resolved
    
    ================================================================================
     Package                Arch      Version                    Repository    Size
    ================================================================================
    Installing:
     kernel-uek-firmware    noarch    2.6.39-100.0.12.el6uek     uek2_beta    1.1 M
         replacing  kernel-firmware.noarch 2.6.32-71.el6
         replacing  kernel-uek-firmware.noarch 2.6.32-100.28.5.el6
    Updating:
     kernel-uek             x86_64    2.6.39-100.0.12.el6uek     uek2_beta     25 M
    
    Transaction Summary
    ================================================================================
    Install       1 Package(s)
    Upgrade       1 Package(s)
    
    Total download size: 26 M
    Is this ok [y/N]:
    Downloading Packages:
    (1/2): kernel-uek-2.6.39-100.0.12.el6uek.x86_64.rpm      |  25 MB     00:51
    (2/2): kernel-uek-firmware-2.6.39-100.0.12.el6uek.noarch | 1.1 MB     00:05
    --------------------------------------------------------------------------------
    Total                                           452 kB/s |  26 MB     00:58
    Running rpm_check_debug
    Running Transaction Test
    Transaction Test Succeeded
    Running Transaction
    Warning: RPMDB altered outside of yum.
      Installing     : kernel-uek-firmware-2.6.39-100.0.12.el6uek.noarch        1/5
      Updating       : kernel-uek-2.6.39-100.0.12.el6uek.x86_64                 2/5
      Cleanup        : kernel-uek-2.6.32-100.28.5.el6.x86_64                    3/5
      Cleanup        : kernel-uek-firmware-2.6.32-100.28.5.el6.noarch           4/5
      Erasing        : kernel-firmware-2.6.32-71.el6.noarch                     5/5
    
    Installed:
      kernel-uek-firmware.noarch 0:2.6.39-100.0.12.el6uek
    
    Updated:
      kernel-uek.x86_64 0:2.6.39-100.0.12.el6uek
    
    Replaced:
      kernel-firmware.noarch 0:2.6.32-71.el6
      kernel-uek-firmware.noarch 0:2.6.32-100.28.5.el6
    
    Complete!
    

    At this point, your new kernel is installed and ready to be used. /etc/grub.conf contains an entry for this beta kernel. Just a simple reboot and you are ready to go.

    # reboot
    

    trying out dtrace

    One of the things we talked about at Oracle Openworld, last week, was Dtrace for Oracle Linux. There's not been much information on it yet so I wanted to write up what you need to do to give it a go.

    We released a preview of Dtrace and we will provide updates as we include new features on an ongoing basis. The biggest project to implement is userspace tracing and many side projects to include and enhance various Dtrace providers in kernel. So it's not all there yet but it is going to continuously be enhanced with new updates.

    Dtrace is made available to Oracle Linux support subscribers and currently requires you to do the following :

    - have your system registered with the Unbreakable Linux Network (ULN) (or a local yum repo that mirrors the channels on ULN)
    - run Oracle Linux 6, of course with the Unbreakable Enterprise Kernel
    - register your server with the Dtrace channel (ol6_x86_64_Dtrace_BETA)
    - install the updated version of 2.6.32 and the dtrace kernel module and userspace programs
    - run it

    Here are some of the detailed steps :
    (1) register your server with ULN, run the uln_register command on your server as the root user. It requires you to enter your single sign-on ID, password and Oracle Linux support ID (CSI).
    (2) once your system(s) is registered, login to the ULN website, click on Systems and find the server(s) you just registered
    (3) Click on the server and go to Manage Subscriptions
    (4) add the Dtrace for Oracle Linux 6 channel to your server and click Save
    (5) back on the server, double check if everything worked by typing yum repolist it should now also show ol6_x86_64_Dtrace_BETA
    (6) install the required packages :
    ... (a) yum install dtrace-modules
    ... (b) yum install dtrace-utils
    ... (c) yum install kernel-uek-2.6.32-201.0.4.el6uek
    You have to install that specific kernel from the dtrace ULN channel because it is the kernel with the instrumentation.
    (7) reboot into the new kernel
    (8) load the kernel modules modprobe dtrace and modprobe systrace

    Now you are ready to start playing with dtrace. We will publish a number of useful dtrace scripts to get you going. I have one quick example. When running an Oracle Database on my machine, I want to find out which files the oracle executable opens on the system. This simple command does it for me :

    dtrace -n 'syscall::open*:entry/execname == "oracle"/{ printf("%s %s", execname, copyinstr(arg0)); }'

    Any syscall that's open* executed by "oracle" will print the first argument being the path in this case.

    A lot more to come but for the existing customers that want to start taking a look at it, check it out. The 2.6.32-201.0.4 kernel source is available on ULN, the dtrace kernel module source is under the CDDL license and is also available on ULN. The userspace tools are as is. Dtrace for our UEK2 Beta kernel is not there yet but will come in an update of the UEK2 beta kernel. Another thing we are evaluating is the use of the ksplice technology to "splice in" the probes/providers at runtime. So you would be able to run a kernel without the extra code, if you want to enable dtrace, you first apply the ksplice dtrace update, run dtrace and after the fact unload it. The word evaluate is key here, however :).

    a little sample :

    # dtrace -n 'syscall::open*:entry/execname == "oracle"/{ printf("%s %s", execname, copyinstr(arg0)); }'
    dtrace: description 'syscall::open*:entry' matched 2 probes
    CPU     ID                    FUNCTION:NAME
      2      8                       open:entry oracle /proc/2672/stat
      2      8                       open:entry oracle /proc/2674/stat
      2      8                       open:entry oracle /proc/2678/stat
      2      8                       open:entry oracle /proc/2682/stat
      2      8                       open:entry oracle /proc/2686/stat
      2      8                       open:entry oracle /proc/2688/stat
      2      8                       open:entry oracle /proc/2690/stat
      2      8                       open:entry oracle /proc/2692/stat
      2      8                       open:entry oracle /proc/2694/stat
      3      8                       open:entry oracle /proc/loadavg
      1      8                       open:entry oracle /u01/app/oracle/oradata/XE/system.dbf
    
    ...

    Sunday Oct 02, 2011

    Oracle VM 3.0.2 patch update

    Last Friday (9/30) we uploaded the first patch for Oracle VM 3.0 to My Oracle Support. You can download the upgrade ISO from the website. Just look for patch 13036236. When you download the patch file p13036236_30_Linux-x86-64.zip(about 50MB), just unzip the file and mount the included ISO image somewhere, or burn a CD and as user root start the runUpgrader.sh script.

    The upgrade utility will be able to upgrade from an existing Oracle VM 3.0.1 installation.
    As always with an upgrade, it's recommended to do a full database repository backup. Or just run the exp utility against your Oracle VM Manager repository database and export the schema that's used to store the repository (defaults to ovs). Just in case.

    Oracle VM Manager 3.0.1 has to be up and running for this process to start, the upgrade process will undeploy the application containers and replace them with the newer version. Most of the information to upgrade will be automatically detected. In version 3.0.1 we save the installation configuration to a config file in /u01/app/oracle/ovm-manager-3/.config. The only thing you will need to provide is the password for the database repository schema.

    We do export the metadata into xml files and the upgrade script backs up these files as well, it would be possible to import this backup into the database again, however I would still recommend and exp of the database, it will be more convenient.(/tmp/ovm-manager-3-backup-(date).zip)
    After the xml files are created, the upgrade tool will transform the data into the new format, delete the old data in the database and import the new version.

    To upgrade the Oracle VM servers, you can register with the ovm3_3.0.2_x86_64_base channel on the Unbreakable Linux Network(ULN) and yum upgrade the servers.

    The detailed documentation can be found here.

    If you do not already have Oracle VM 3.0.1 installed, a new installer ISO image for Oracle VM 3.0.2 will also be made available in the next several days. If you can't wait, download 3.0.1 + the 3.0.2 patch and you're good to go as well.
    About

    Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

    You can follow him on Twitter at @wimcoekaerts

    Search

    Categories
    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    9
    10
    11
    12
    13
    14
    15
    16
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today