Amazon EC2 OpenSolaris re-bundling process trouble shutting

This entry is based on customer escalation, I hope it will help or at least inspire you in some extend too.

This entry is part of 'OpenSolaris on Amazon EC2' workshop


Target::

Customer wont to regularly backup running Amazon EC2 OpenSolaris instance, while APPs was running.

Issue::

Re-bundling take a,ling time and it looks frozen (plus it was occupy lot of resources)

Monitor ZFS cloning process - (rebundle.sh) script

Note: Rebundle script use ZFS mirror cloning mechanism implemented on system level, as such it has build it "share resources" capability.

Log with another connection and start this rebundle.sh script

Paste it in BASH, some $ and \\ is escape with ONE more \\

cat >/tmp/mon.ksh <<EOF
#!/bin/ksh
 echo "ZFS Cloning started"

echo "Waiting , so clone process really start "

   while true
   do
      zpool status rpool | grep "resilver in progress" >/dev/null

      if [ \\$? -eq 0 ]
      then
         break        
      else
         print -n -e "\\b-"
         sleep 1
         print -n -e "\\b\\\\\\"
         sleep 1
         print -n -e "\\b|"
         sleep 1
         print -n -e "\\b/"
         sleep 1
      fi
   done

  while true
   do
      zpool status rpool | grep "resilver in progress" >/dev/null

      if [ \\$? -eq 1 ]
      then
         break        
      else
         status=\\$(zpool status rpool | grep "resilver in progress"| gsed -e 's/ scrub: resilver in progress for/Elapsed/g' )
         print -n -e "\\r \\$status  "
         print -n -e "\\b-"
         sleep 1
         print -n -e "\\b\\\\\\"
         sleep 1
         print -n -e "\\b|"
         sleep 1
         print -n -e "\\b/"
         sleep 1
      fi
   done

echo "ZFS Cloning ended"
exit 0
EOF

chmod 0777 /tmp/mon.ksh
/tmp/mon.ksh

Monitor and Control resources usage during ec2bundle command

  1. Try to add to params -v (or --verbose)
  1. ec2-bundle-image command is in ruby and simple use pipes with gzip, is possible then for some reason there is not enough cpu or mem to proccess it.

MEM: try to make couple "sync; sleep 10" commands so zfs cache cleans memory after cloning , monitor MEM, check, if you have swap and you don't swap when executing ec2-bundle command.

We need keep in mind then ec2tools was designed on Linux, where /tmp is by default on disk, on OpenSolaris? /tmp is by default in MEMORY backuped by SWAP.

CPU: try just to export GZIP zariable with level 1 compression, this will lover cpu load (undocumented by Amazon, but can help isolate issue)

(export GZIP='-1'; ec2-bundle-image ...)

Of course from CPU point of view you can try to run AMI in Second more powerful 32bit profile where you have 2 virtual CPUS

What ec2-bundle-image pipe does

    cat /opt/ec2/lib/ec2/amitools/bundle.rb

        # Bundle the AMI procedure:
        # The image file is tarred - to maintain sparseness, gzipped
    for compression and then encrypted with AES in CBC mode for
    confidentiality.
        # To minimize disk I/O the file is read from disk once and
    piped via several processes.     # The tee is used to allow a
    digest of the file to be calculated without having to re-read it
    from disk.

        pipeline.concat([
          ['tar', "#{openssl} sha1 < #{digest_pipe} & " + tar.expand],
          ['tee', "tee #{digest_pipe}"],
          ['gzip', 'gzip'],
          ['encrypt', "#{openssl} enc -e -aes-128-cbc -K #{key} -iv #{iv} > #{bundled_file_path}"]
          ]) 

1. You can inspect ruby bundle pipe with Solaris commands

1.1 Get rebundle process PID

# ps -ef | grep "ec2-bundle-image -c"
   root  6205  2146   0 14:49:52 pts/1       0:00 /bin/bash /opt/ec2/bin/ec2-bundle-image -c

1.2 See command tree

# ptree 6205
       6205  /bin/bash /opt/ec2/bin/ec2-bundle-image -c /mnt/keys/cert-GW---------------------CF
         6206  ruby -I /opt/ec2/lib /opt/ec2/lib/ec2/amitools/bundleimage.rb -c /mnt/keys/cert
           6210  /bin/bash -c /usr/sfw/bin/openssl sha1 < /tmp/ec2-bundle-image-digest-pipe & /u
             6211  /usr/sfw/bin/openssl sha1
             6212  /usr/sfw/bin/gtar -c -h -S -C /mnt Glassfish_2008.11_32_1.0.img
             6213  tee /tmp/ec2-bundle-image-digest-pipe
             6214  gzip
             6215  /usr/sfw/bin/openssl enc -e -aes-128-cbc -K 48--keep-it-really-sercert--65

1.3 You can see whole constructed pipe as args of process next to ruby:

# pargs 6210

2 .You can try to monitor or even limit PIPE line bandwidth itself by using PV (Pipe Viewer)

Note: For this you need patch a Amazon EC2 ruby library, this is an supported, for debugging purposes only hack.

PV (Pipe Viewer)

With PV you can limit bandwidth so process will not be so resource intensive, for more see PV man here

# pkg set-authority -O http://pkg.opensolaris.org/contrib contrib

# pkg install pv

#  pv -V
pv 1.1.4 - Copyright(C) 2008 Andrew Wood <andrew.wood@ivarch.com>

# cp /opt/ec2/lib/ec2/amitools/bundle.rb /opt/ec2/lib/ec2/amitools/bundle.rb.org
# vim /opt/ec2/lib/ec2/amitools/bundle.rb

['tar', "#{openssl} sha1 < #{digest_pipe} & " + tar.expand],
['pv','pv -q -L 500k'],
['tee', "tee #{digest_pipe}"],

Hint: If you want to use pv just for progress monitoring use:

['pv','pv -N rebundle -Wpteb -s 10485761024 -B 500000 -f'],


You will see:

Rebundle:  753MB 0:02:36 [==>                           ]  7% ETA 0:31:56

# start ec2-bundle-image command

# Get pv pid

ps -ef | grep pv
   root  6794  6791   0 15:38:46 pts/3       0:00 pv -N rebundle -Wptebf -s 10485761024 -B 68157440

Now you can limit bandwith with pv -R PID -L banwidth

# pv -R 6794 -L 100K

You can also play with -B buffer size param
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Hands-on experience with Virtualization

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today