Containers on Linux

At Oracle OpenWorld we talked about Linux Containers. Here is an example of getting a Linux container going with Oracle Linux 6.1, UEK2 beta and btrfs. This is just an example, not released, production, bug-free... for those that don't read README files ;-)

This container example is using the existing Linux cgroups features in the mainline kernel (and also in UEK, UEK2) and lxc tools to create the environments.

Example assumptions :
- Host OS is Oracle Linux 6.1 with UEK2 beta.
- using btrfs filesystem for containers (to make use of snapshot capabilities)
- mounting the fs in /container
- use Oracle VM templates as a base environment
- Oracle Linux 5 containers

I have a second disk on my test machine (/dev/sdb) which I will use for this exercise.

# mkfs.btrfs  -L container  /dev/sdb

# mount
/dev/mapper/vg_wcoekaersrv4-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_wcoekaersrv4-lv_home on /home type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/loop0p2 on /mnt type ext3 (rw)
/dev/mapper/loop1p2 on /mnt2 type ext3 (rw)
/dev/sdb on /container type btrfs (rw)

lxc tools installed...

# rpm -qa|grep lxc
lxc-libs-0.7.5-2.x86_64
lxc-0.7.5-2.x86_64
lxc-devel-0.7.5-2.x86_64

lxc tools come with template config files :

# ls /usr/lib64/lxc/templates/
lxc-altlinux lxc-busybox lxc-debian lxc-fedora lxc-lenny lxc-ol4 lxc-ol5 lxc-opensuse lxc-sshd lxc-ubuntu
I created one for Oracle Linux 5 : lxc-ol5.

Download Oracle VM template for OL5 from http://edelivery.oracle.com/linux. I used OVM_EL5U5_X86_PVM_10GB.
We want to be able to create 1 environment that can be used in both container and VM mode to avoid duplicate effort.

Untar the VM template.

# tar zxvf OVM_EL5U5_X86_PVM_10GB.tar.gz
These are the steps needed (to be automated in the future)...
Copy the content of the VM virtual disk's root filesystem into a btrfs subvolume in order to easily clone the base template.

My template configure script defines :
template_path=/container/ol5-template

- create subvolume ol5-template on /containers

# btrfs subvolume create /container/ol5-template
Create subvolume '/container/ol5-template'
- loopback mount the Oracle VM template System image / partition
# kpartx -a System.img 
# kpartx -l System.img 
loop0p1 : 0 192717 /dev/loop0 63
loop0p2 : 0 21607425 /dev/loop0 192780
loop0p3 : 0 4209030 /dev/loop0 21800205
I need to mount the 2nd partition of the virtual disk image, kpartx will set up loopback devices for each of the virtual disk partitions. So let's mount loop0p2 which will contain the Oracle Linux 5 / filesystem of the template.
# mount /dev/mapper/loop0p2 /mnt

# ls /mnt
bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
Great, now we have the entire template / filesystem available. Let's copy this into our subvolume. This subvolume will then become the basis for all OL5 containers.
# cd /mnt
# tar cvf - * | ( cd /container/ol5-template ; tar xvf ; )
In the near future we will put some automation around the above steps.
# pwd
/container/ol5-template

# ls
bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
From this point on, the lxc-create script, using the template config as an argument, should be able to automatically create a snapshot and set up the filesystem correctly.
# lxc-create -n ol5test1 -t ol5

Cloning base template /container/ol5-template to /container/ol5test1 ...
Create a snapshot of '/container/ol5-template' in '/container/ol5test1'
Container created : /container/ol5test1 ...
Container template source : /container/ol5-template
Container config : /etc/lxc/ol5test1
Network : eth0 (veth) on virbr0
'ol5' template installed
'ol5test1' created

# ls /etc/lxc/ol5test1/
config  fstab

# ls /container/ol5test1/
bin  boot  dev  etc  home  lib  lost+found  media  misc  mnt  opt  proc  
root  sbin  selinux  srv  sys  tftpboot  tmp  u01  usr  var
Now that it's created and configured, we should be able to just simply start it :
# lxc-start -n ol5test1
INIT: version 2.86 booting
                Welcome to Enterprise Linux Server
                Press 'I' to enter interactive startup.
Setting clock  (utc): Sun Oct 16 06:08:27 EDT 2011         [  OK  ]
Loading default keymap (us):                               [  OK  ]
Setting hostname ol5test1:                                 [  OK  ]
raidautorun: unable to autocreate /dev/md0
Checking filesystems
                                                           [  OK  ]
mount: can't find / in /etc/fstab or /etc/mtab
Mounting local filesystems:                                [  OK  ]
Enabling local filesystem quotas:                          [  OK  ]
Enabling /etc/fstab swaps:                                 [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting sysstat:  Calling the system activity data collector (sadc): 
                                                           [  OK  ]
Starting background readahead:                             [  OK  ]
Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: nat mangle filter         [  OK  ]
Applying iptables firewall rules:                          [  OK  ]
Loading additional iptables modules: no                    [FAILED]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... done.
                                                           [  OK  ]
Starting system logger:                                    [  OK  ]
Starting kernel logger:                                    [  OK  ]
Enabling ondemand cpu frequency scaling:                   [  OK  ]
Starting irqbalance:                                       [  OK  ]
Starting portmap:                                          [  OK  ]
FATAL: Could not load /lib/modules/2.6.39-100.0.12.el6uek.x86_64/modules.dep: No such file or directory
Starting NFS statd:                                        [  OK  ]
Starting RPC idmapd: Error: RPC MTAB does not exist.
Starting system message bus:                               [  OK  ]
Starting o2cb:                                             [  OK  ]
Can't open RFCOMM control socket: Address family not supported by protocol

Mounting other filesystems:                                [  OK  ]
Starting PC/SC smart card daemon (pcscd):                  [  OK  ]
Starting HAL daemon:                                       [FAILED]
Starting hpiod:                                            [  OK  ]
Starting hpssd:                                            [  OK  ]
Starting sshd:                                             [  OK  ]
Starting cups:                                             [  OK  ]
Starting xinetd:                                           [  OK  ]
Starting crond:                                            [  OK  ]
Starting xfs:                                              [  OK  ]
Starting anacron:                                          [  OK  ]
Starting atd:                                              [  OK  ]
Starting yum-updatesd:                                     [  OK  ]
Starting Avahi daemon...                                   [FAILED]
Starting oraclevm-template...
Regenerating SSH host keys.
Stopping sshd:                                             [  OK  ]
Generating SSH1 RSA host key:                              [  OK  ]
Generating SSH2 RSA host key:                              [  OK  ]
Generating SSH2 DSA host key:                              [  OK  ]
Starting sshd:                                             [  OK  ]
Regenerating up2date uuid.
Setting Oracle validated configuration parameters.

Configuring network interface.
  Network device: eth0
  Hardware address: 52:19:C0:EF:78:C4

Do you want to enable dynamic IP configuration (DHCP) (Y|n)? 

... 
This will run the well-known Oracle VM template configure scripts and set up the container the same way as it would an Oracle VM guest.

The session that runs lxc-start is the local console. It is best to run this session inside screen so you can disconnect and reconnect.

At this point,I can use lxc-console to log into the local console of the container, or, since the container has its internal network up and running and sshd is running, I can also just ssh into the guest.
# lxc-console -n ol5test1 -t 1

Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
Kernel 2.6.39-100.0.12.el6uek.x86_64 on an x86_64

host login: 
I can simple get out of the console entering ctrl-a q.

From inside the container :
# mount
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

# /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 52:19:C0:EF:78:C4  
          inet addr:192.168.122.225  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::5019:c0ff:feef:78c4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:141 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:8861 (8.6 KiB)  TX bytes:2476 (2.4 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)

# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   2124   656 ?        Ss   06:08   0:00 init [3]  
root       397  0.0  0.0   1780   596 ?        Ss   06:08   0:00 syslogd -m 0
root       400  0.0  0.0   1732   376 ?        Ss   06:08   0:00 klogd -x
root       434  0.0  0.0   2524   368 ?        Ss   06:08   0:00 irqbalance
rpc        445  0.0  0.0   1868   516 ?        Ss   06:08   0:00 portmap
root       469  0.0  0.0   1920   740 ?        Ss   06:08   0:00 rpc.statd
dbus       509  0.0  0.0   2800   576 ?        Ss   06:08   0:00 dbus-daemon --system
root       578  0.0  0.0  10868  1248 ?        Ssl  06:08   0:00 pcscd
root       610  0.0  0.0   5196   712 ?        Ss   06:08   0:00 ./hpiod
root       615  0.0  0.0  13520  4748 ?        S    06:08   0:00 python ./hpssd.py
root       637  0.0  0.0  10168  2272 ?        Ss   06:08   0:00 cupsd
root       651  0.0  0.0   2780   812 ?        Ss   06:08   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root       660  0.0  0.0   5296  1096 ?        Ss   06:08   0:00 crond
root       745  0.0  0.0   1728   580 ?        SNs  06:08   0:00 anacron -s
root       753  0.0  0.0   2320   340 ?        Ss   06:08   0:00 /usr/sbin/atd
root       817  0.0  0.0  25580 10136 ?        SN   06:08   0:00 /usr/bin/python -tt /usr/sbin/yum-updatesd
root       819  0.0  0.0   2616  1072 ?        SN   06:08   0:00 /usr/libexec/gam_server
root       830  0.0  0.0   7116  1036 ?        Ss   06:08   0:00 /usr/sbin/sshd
root      2998  0.0  0.0   2368   424 ?        Ss   06:08   0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0.leases -pf /var/run/dhc
root      3102  0.0  0.0   5008  1376 ?        Ss   06:09   0:00 login -- root     
root      3103  0.0  0.0   1716   444 tty2     Ss+  06:09   0:00 /sbin/mingetty tty2
root      3104  0.0  0.0   1716   448 tty3     Ss+  06:09   0:00 /sbin/mingetty tty3
root      3105  0.0  0.0   1716   448 tty4     Ss+  06:09   0:00 /sbin/mingetty tty4
root      3138  0.0  0.0   4584  1436 tty1     Ss   06:11   0:00 -bash
root      3167  0.0  0.0   4308   936 tty1     R+   06:12   0:00 ps aux
From the host :
# lxc-info -n ol5test1
state:   RUNNING
pid:     16539

# lxc-kill -n ol5test1

# lxc-monitor -n ol5test1
'ol5test1' changed state to [STOPPING]
'ol5test1' changed state to [STOPPED]
So creating more containers is trivial. Just keep running lxc-create.
# lxc-create -n ol5test2 -t ol5

# btrfs subvolume list /container
ID 297 top level 5 path ol5-template
ID 299 top level 5 path ol5test1
ID 300 top level 5 path ol5test2
lxc-tools will be uploaded to the uek2 beta channel to start playing with this.

Oracle Linux 4 example

Here is the same principle for Oracle Linux 4. Using the template create script lxc-ol4. I started out using the OVM_EL4U7_X86_PVM_4GB template and followed the same steps.

# kpartx -a System.img 

# kpartx -l System.img 
loop0p1 : 0 64197 /dev/loop0 63
loop0p2 : 0 8530515 /dev/loop0 64260
loop0p3 : 0 4176900 /dev/loop0 8594775

# mount /dev/mapper/loop0p2 /mnt

# cd /mnt

# btrfs subvolume create /container/ol4-template
Create subvolume '/container/ol4-template'

# tar cvf - * | ( cd /container/ol4-template ; tar xvf - ; )

# lxc-create -n ol4test1 -t ol4

Cloning base template /container/ol4-template to /container/ol4test1 ...
Create a snapshot of '/container/ol4-template' in '/container/ol4test1'
Container created : /container/ol4test1 ...
Container template source : /container/ol4-template
Container config : /etc/lxc/ol4test1
Network : eth0 (veth) on virbr0
'ol4' template installed
'ol4test1' created

# lxc-start -n ol4test1
INIT: version 2.85 booting
/etc/rc.d/rc.sysinit: line 80: /dev/tty5: Operation not permitted
/etc/rc.d/rc.sysinit: line 80: /dev/tty6: Operation not permitted
Setting default font (latarcyrheb-sun16):                  [  OK  ]

                Welcome to Enterprise Linux
                Press 'I' to enter interactive startup.
Setting clock  (utc): Sun Oct 16 09:34:56 EDT 2011         [  OK  ]
Initializing hardware...  storage network audio done       [  OK  ]
raidautorun: unable to autocreate /dev/md0
Configuring kernel parameters:  error: permission denied on key 'net.core.rmem_default'
error: permission denied on key 'net.core.rmem_max'
error: permission denied on key 'net.core.wmem_default'
error: permission denied on key 'net.core.wmem_max'
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.core_uses_pid = 1
fs.file-max = 327679
kernel.msgmni = 2878
kernel.msgmax = 8192
kernel.msgmnb = 65536
kernel.sem = 250 32000 100 142
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.sysrq = 1
fs.aio-max-nr = 3145728
net.ipv4.ip_local_port_range = 1024 65000
kernel.shmmax = 4398046511104
                                                           [FAILED]
Loading default keymap (us):                               [  OK  ]
Setting hostname ol4test1:                                 [  OK  ]
Remounting root filesystem in read-write mode:             [  OK  ]
mount: can't find / in /etc/fstab or /etc/mtab
Mounting local filesystems:                                [  OK  ]
Enabling local filesystem quotas:                          [  OK  ]
Enabling swap space:                                       [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting sysstat:                                          [  OK  ]
Setting network parameters:  error: permission denied on key 'net.core.rmem_default'
error: permission denied on key 'net.core.rmem_max'
error: permission denied on key 'net.core.wmem_default'
error: permission denied on key 'net.core.wmem_max'
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.core_uses_pid = 1
fs.file-max = 327679
kernel.msgmni = 2878
kernel.msgmax = 8192
kernel.msgmnb = 65536
kernel.sem = 250 32000 100 142
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.sysrq = 1
fs.aio-max-nr = 3145728
net.ipv4.ip_local_port_range = 1024 65000
kernel.shmmax = 4398046511104
                                                           [FAILED]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Starting system logger:                                    [  OK  ]
Starting kernel logger:                                    [  OK  ]
Starting portmap:                                          [  OK  ]
Starting NFS statd:                                        [FAILED]
Starting RPC idmapd: Error: RPC MTAB does not exist.
Mounting other filesystems:                                [  OK  ]
Starting lm_sensors:                                       [  OK  ]
Starting cups:                                             [  OK  ]
Generating SSH1 RSA host key:                              [  OK  ]
Generating SSH2 RSA host key:                              [  OK  ]
Generating SSH2 DSA host key:                              [  OK  ]
Starting sshd:                                             [  OK  ]
Starting xinetd:                                           [  OK  ]
Starting crond:                                            [  OK  ]
Starting xfs:                                              [  OK  ]
Starting anacron:                                          [  OK  ]
Starting atd:                                              [  OK  ]
Starting system message bus:                               [  OK  ]
Starting cups-config-daemon:                               [  OK  ]
Starting HAL daemon:                                       [  OK  ]
Starting oraclevm-template...
Regenerating SSH host keys.
Stopping sshd:                                             [  OK  ]
Generating SSH1 RSA host key:                              [  OK  ]
Generating SSH2 RSA host key:                              [  OK  ]
Generating SSH2 DSA host key:                              [  OK  ]
Starting sshd:                                             [  OK  ]
Regenerating up2date uuid.
Setting Oracle validated configuration parameters.

Configuring network interface.
  Network device: eth0
  Hardware address: D2:EC:49:0D:7D:80

Do you want to enable dynamic IP configuration (DHCP) (Y|n)? 
...
...

# lxc-console -n ol4test1

Enterprise Linux Enterprise Linux AS release 4 (October Update 7)
Kernel 2.6.39-100.0.12.el6uek.x86_64 on an x86_64

localhost login: 

Comments:

Very Cool Wim. I realize that this is a quick mechanism to test something out on a developer machine (cgroups are very cool. I really need to learn them better.. much better than old limits mechanism). However, for production use, how might you see this fitting in (I'm trying to compare to Xen). Would there ever be a way to "pause" a container or migrate that container to another host? What about shared resources? Is it possible to have a common rootvg and have multiple containers pointing to that same rootvg (just use different /var, /etc/, ...) I'm thinking you could just have symlinks in your /container/ol5-template/usr/ -> /usr/, as an example? I guess this might break any chrooting a bit (if that gets used), but seems like it might be doable.

Posted by Joe Hoot on October 16, 2011 at 10:43 PM PDT #

Well, they're different in many ways.

If you don't mind a single kernel image for all your containers and the fact that you have to bring down the box to upgrade it (although with ksplice on oracle linux you would be able to avoid that :-) so ksplice makes containers very cool on linux)...

If you run many isolated apps or want that isolation yet you want no over head of extra scheduling and easy sharing of system resources, then containers is a good thing. That's why it's so widely used in the ISP world. you can host many containers on a server. With hypervisor based virtualization, the amount of sharing goes down and there's a level of overhead for scheduling VMs (not much but some). Of course, you have a greater level of isolation and much greater flexibility in terms of what you run in a VM versus a container.

So it's really a toss up. Basically, if you run one OS, don't need live migrate or so, this is a bit better a model. One is certainly not a replacement of the other, supporting both is quite useful.

Posted by Wim on October 17, 2011 at 03:40 AM PDT #

hi wcoekaer ,
I want to contribute by making a small correction to your tutorial.

To avoid execution error
on the line that says:
# tar cvf - * | ( cd /container/ol5-template ; tar xvf ; )

should say:
# tar cvf - * | ( cd /container/ol5-template ; tar xvf - ; )

Posted by guest on January 04, 2012 at 12:28 AM PST #

Great tutorial Wim, thanks! One thing to add to the comments for anybody else doing this: libvirt isn't default in the OL6 install, so that has to get installed/started ala-carte before starting the container.

Posted by Jon Senger on March 20, 2012 at 02:52 PM PDT #

Wim,

I started thinking a bit more about this. Do you have any quality tutorials (or any documentation outside of the kernel source readme on it) that might explain the best way to get started with cgroups? I'm not as concerned with lxc, but more so with just standard cgroups. I understand there is a libcgroups (or something like that) which has some configuration in /etc/ to allow the user of cgroups more easily (I think).

If you have anything to contribute on this, I'd appreciate it.

Thanks,
Joe

Posted by Joe Hoot on May 17, 2012 at 01:51 PM PDT #

Wim,

I was doing some more research on cgroups and found the following resources that I think are really good. In fact, I got to see the RedHat Summit presentation last year in Boston, and it was great to see the live demos (I don't recall seeing Linda Wang there though):

* http://www.redhat.com/summit/2011/presentations/summit/in_the_weeds/friday/WangKozdemba_f_1130_cgroups14.pdf
* http://linux.oracle.com/documentation/EL6/Red_Hat_Enterprise_Linux-6-Resource_Management_Guide-en-US.pdf
* http://www.oracle.com/technetwork/articles/servers-storage-admin/resource-controllers-linux-1506602.html

So what I've found is that it doesn't appear that libcgroup is available as an rpm in OL5, which is the only hangup that I currently have. I don't know if it has prereqs that are only available in OL6 or not, but if it doesn't then I would imagine that I can just compile it from sources... seems to be a typical configure/make/make install.. But before I go to that level, do you know if you've seen oracle-compiled rpms that are available in any yum repos? I didn't see them in the OracleLinux/OL5 repos at all.

Thanks,
Joe

Posted by Joseph Hoot on May 22, 2012 at 09:53 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

You can follow him on Twitter at @wimcoekaerts

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today