Amazon AWS (EC2,S3) environment Overview

Amazon AWS represent Utility on demand , Pay as you go concepts, Amazon AWS is also known as Highly Scalable, developer oriented, Web Platform build on XEN hypervisor technology.

This entry is part of 'OpenSolaris on Amazon EC2' workshop

Amazon EC2 and S3

Main building stones of Amazon AWS (Amazon Web Services) are computing service EC2 (Elastic Compute Cloud) backed up with storage service S3 (Simple Storage Service).

Main target of Amazon AWS (Amazon Web Services) are developers who will create solutions based on it, this is why main interface for Amazon AWS are powerful APIs and simple CLI (command line) tools build on top of them. Example of created solutions on top of Amazon AWS are end-user customer oriented RightScale services or developer oriented Scalr platform.

Amazon EC2 API version represent concrete feature set, if new feature is added, new API version is released if needed. Both main Amazon AWS APIs (EC2,S3) are very well integrated in many popular programing languages Popular Amazon AWS (EC2,S3) programming LIBraries .

On Amazon EC2 you can run instances of pre-prepared and registered images - AMIs (Amazon Machine Images). AMIs represent typical Virtual Appliances (OS and APP images) based on Linux and now also on OpenSolaris. Amazon EC2 user can build its own AMI images or directly use pre-prepared 3rd party images. By default AMI images are private, but any user of Amazon EC2 can offer its own images to be accessible for others for free or by paying small fees.

Before you can start building complex scalable systems you need to have good base AMI and also you need well understand principles of building such images, providing infos for creating good AMIs is one of main goals of this workshop.

Amazon EC2 Instance Types

When Amazon EC2 was introduced there was only one instance type (Small), after 2 64-bit instances was added. In Jun/08 also High-CPU instance types was aeed, which have a proportionately higher ratio of CPU to memory and are well suited for compute-intensive applications, so currently you can use in Amazon EC2 this 5 instance types:

Small Instance (default)Large Instance Extra Large Instance High-CPU Medium InstanceHigh-CPU Extra Large Instance
m1.small m1.large m1.xlarge c1.mediumc1.xlarge
1.7 GB memory7.5 GB memory15 GB memory 1.7 GB memory7 GB of memory
1 EC2 Compute Unit
(1 virtual core with
1 EC2 Compute Unit)
4 EC2 Compute Units
(2 virtual cores with
2 EC2 Compute Units each)
8 EC2 Compute Units
(4 virtual cores with
2 EC2 Compute Units each)
5 EC2 Compute Units
(2 virtual cores with
2.5 EC2 Compute Units each)
20 EC2 Compute Units
(8 virtual cores with
2.5 EC2 Compute Units each)
160 GB instance storage
(150 GB plus 10 GB root)
850 GB instance storage
(2 x 420 GB plus 10 GB root)
1,690 GB instance storage
(4 x 420 GB plus 10 GB root)
350 GB instance storage
(340 GB plus 10 GB root)
1,690 GB instance storage
(4 x 420 GB plus 10 GB root)
32-bit platform64-bit platform64-bit platform 32-bit platform64-bit platform
I/O Performance: ModerateI/O Performance: HighI/O Performance: High I/O Performance: ModerateI/O Performance: High
Price: $0.10 per instance hourPrice: $0.40 per instance hourPrice: $0.80 per instance hour Price: $0.20 per instance hourPrice: $0.80 per instance hour

AMIs are during creation time marked as 32 or 64 bit architecture. You can't start 32bit AMI on 64bit instance type !

When you create instance you can use default Kernel/Ramdisk images specified during AMI creation time, sometimes you can specify alternative Kernel/Ramdisk images. Users can't for now create Kernel/Ramdisk images or use ones from AMI (Missing something like PyGRUB functionality). Main reason for that is them Amazon want to have quality of Kernels/Ramdisk images under control, so only selected vendors can create them. As a side effect new updated Kernel/Ramdisk images are booted if your instance need to modify them.

Amazon EC2 instances are purely dynamic in nature

When you start AMI instance(s) they got resources based on instance type you select for them on start. Your access SSH key is copied into instance on start, instance got dynamic internal and external IP data by DHCP and data storage is empty. When instance is terminated or crash all instance data is lost. However docs said then restart of OS will keep all state and data, so reboot is similar to XEN xm shutdown, xm start

Running instance can be stopped at any time, it can just crash, but your instance can be also just rebooted if EC2 HW you are running on need maintenance, in this situation EC2 framework will try to do them best to correctly shutdown your instance. However shutting time is limited, it is similar to XEN '''xm shutdown, sleep N seconds, xm destroy'''

By default created instance is protected by Amazon EC2 integrated distributed firewall, you need open ports to be instance accessible. Firewall setup can be grouped into profiles and this profiles can be used later when starting instances to also better control inter-instance communication.

You can start more then one same AMI instance clone in one time or more instances of different instances, you can also parametrize instance on their start, but they will all be indentical and all of them will receive same data. Number of running instances per account is limited (to 20 ?), if you need more instances you need ask Amazon sales.

Amazon EC2 Ephemeral - no persistent store

From Amazon EC2 developer docs:

"Every instance includes a fixed amount of storage space on which you can store data. Within this document, it is referred to as the ephemeral store as it is not designed to be a permanent storage solution. If an instance reboots (intentionally or unintentionally), the data on the ephemeral store will survive. If the underlying drive fails or the instance is terminated, the data will be lost."

Recommended solution is to use Amazon S3 as back-in/back-out solutions, so Amazon S3 will be used as simple as backup/restore solutions.

Also is know the fist write to ephemeral storage is much slower then subsequent writes (Nor clear, same place or same space ?). It's recomended to pre-allocate space by first dummy ZEROs wrire, but for full storage capacity this can be slow. To improve reability and speed various SW RAID models can be used, if you have more then one data drive.

How ephemeral store is visible to OS is depend on device mapping done at AMI build process, however I don't recommend to change default setups.

Amazon EC2 Instance boot process

Many Amazon users complain then sometimes creation of instance and/or time until they can access created instance is (very) long, so I will try to make some more deeper explanations of this process:

  • Start time - time until you will see in EC2 cli tool running status
  • Boot time - time until you can access instance (by SSH or your APP)

Start time is similar to pressing "power on" button on your computer until your OS is starting boot from disk.
Boot time is similar to see your OS start menu until OS is loaded and you see login screen.

Instance Start time

An AMI is stored on Amazon S3 and on start is downloaded into local Amazon EC2 cache, unpacked from there onto a host and finaly is booted.

It's a good practice to keep images small in both aspects in form of compressed image and as properly selected run-time root disk size.

Only very specialized AMIs use OS with all instaled packages and/or max 10GB root disk size. Keep in mind then SWAP size need to be taken in account too.

Because AMI will be cached start time will be also influenced by frequency how ofter image is used.

Because start process use shared resources, start time can depend on current S3/EC2 load.

Instance boot time

Overall instance boot time will be depend on how many services you start on OS boot and on how complex APP you are starting. It's a good tractire to keep your exectution profile small - limit number of running services only to ones really needed.

Of course you are running in XEN hypervisor VM, boot time can be influenced by other 3rd side load on same HW server.

Amazon EC2 Steps towards persistence

One of the larger Amazon EC2 issues deals with configuration and data persistence of instances.

Currently there are many limitations in Amazon AWS that make it difficult to migrate your application directly on Amazon EC2 platform unless you carefully build solution around the Amazon EC2 scalable architecture

If your running instance crashes and you run it again, you will loose your data and when the new instance comes back up it will have a new public IP, so you need to deal with DYNAMIC DNS issues.

To better address needs of smaller Amazon EC2 users and to simplify Amazon AWS adoption several steps towards Persistence are under way:

Elastic IP

New feature called Elastic IPs is charged feature represent DYNAIMIC DNS on Amazon side and lets you assign by Amazon assigned static public preallocated IP address to your currently running instance with low refresh time (up to minutes).

Persistent Storage

Recently Amazon announce plan to introduce Persistent Storage, but after large discussion flame we this feature is not YET available for public.

Persistent Storage Boosts Amazon Web Services; Enterprise Ambitions

In last Amazon AWS newsletter No37 was present this announcement about it:

" Amazon EC2 Upcoming Feature: Persistent Storage

Last month, we announced a major upcoming feature that many of you have requested - persistent storage for Amazon EC2. This new feature provides reliable, persistent storage volumes for use with Amazon EC2 instances. These volumes exist independently from any Amazon EC2 instances, and will behave like raw, unformatted hard drives or block devices, which may then be formatted and configured based on the needs of your application. The volumes will be significantly more durable than the local disks within an Amazon EC2 instance. Additionally, our persistent storage feature will enable you to automatically create snapshots of your volumes and back them up to Amazon S3 for even greater reliability. We expect to make persistent storage for Amazon EC2 publicly available later this year. "

See more in Persistent Storage planned feature discussion

But full set of features and price is yet unknown, price can be here significant adoption factor, because in Linux is possible to emulate disk from S3 storage by FUSE module (File system in user space) and there is already complete solution with cache feature based on FUSE module, see more persistentFS

Reliability and Accessibility

Amazon also constnatly improving reliability and accessibility of its services, recently this three feature was announced:

  • Availability Zones
  • AWS Premium Support (including MySQL support provided by SUN)
  • Service Health Dashboard, which reports the status of our services and is available free-of-charge to all.

Amazon S3 for Amazon EC2 users

Shrink from Amazon S3 docs:

"Amazon Simple Storage Service (Amazon S3) is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 is intentionally built with a minimal feature set. Built to be flexible so that protocol or functional layers can easily be added.

Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited. Each object is stored in a bucket and retrieved via a unique, developer-assigned key.

A bucket can be located in the United States or in Europe. All objects within the bucket will be stored in the bucket's location, but the objects can be accessed from anywhere.

Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.

API uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit. Default download protocol is HTTP. A BitTorrent can be also used."

Commentary on S3

Amazon S3 is SIMPLE like a Webserver with WebDAV interface where buckets are Web subdomain, Amazon S3 its not a like File system!

Can't think you cat directly on S3 easily store GB sized files or easily work with many of small files in many directories.

As a side effect many tools which use S3 as backend storage have private formats and are not compatible ! Is common when you store directory structure with one tool, you can't read it with another ! So evaluate, select and use S3 tool(s) in your project carefully.

Current limitations is then one S3 account can have now up to 100 buckets, ask sales if you want more. Also Amazon S3 have separated from EC2 login info and usage is charged separately from EC2 charges.

Amazon AWS (EC2,S3) Known Issues

  • Running AMI must have set correct system time otherwise Amazon EC2 or S3 tools can fail to connect
  • XEN have lines inside main 3.X version as 3.0.X,3.1.X,3.2.X, lines are not always 100% compatible
Comments:

The persistent storage solution for Amazon EC2 now is Amazon EBS (Elastic block storage). With this functionality one can create a volume (from 1GB to 1TB) and attach it to an existent EC2 instance (it's like mounting a hard disk). EBS volumes are not dependent on the life of the instance so this solves the problem (however, you will be charged around $0.1 per GB-month, see http://aws.amazon.com/ebs).

Hope this has been useful.

David,

Web Developer at http://buzzyventure.com

Posted by David on September 28, 2008 at 06:16 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Hands-on experience with Virtualization

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today