On Amazon EC2's underlying architecture

Source: Internet
Author: User

 

Registrant people know that Amazon Web services are one of the big players in the cloud computing business, and especially their infrastructure as a service offering EC2 is becoming increasingly popular. few people know that EC2 is probably one of the biggest xen installations deployed. but how many know how EC2 actually works and how the underlying architecture is constructed? I was curious and needed that kind of insight for my master's thesis, which deals with EC2 from a security perspective. the following notes were gathered out of plain curiosity and for academic purposes only. the notes are not complete and a lot of guessing is involved, so I might be wrong (please leave a comment if you think so ).

Hypervisor & dom0

As I said Amazon EC2 is probably one of the biggest xen installations deployed. it is said that Amazon uses a heavily modified and adapted version of xen, but unfortunately I was not able to gather information about exact version numbers. dom0, The xen management domain, can be either be based on Linux, NetBSD, or opensolaris. based on the information I have gathered for the storage setup, I am very certain that it is Linux based. I do not know the version number of the used kernel. amazon seems to be fond of RedHat Linux, so the dom0 might be a RedHat Linux.

Storage

Amazon EC2 uses two different kinds of storage. One is local storage, knownInstance Storage, Which is non-persistent and data will be lost after an instance terminates. The other kind is persistent, network-based storage calledElastic Block Store(EBS), which can be attached to running instances or also used as a persistent boot medium. I have excluded Amazon S3 from here. The information in this section were gathered fromXenstore, Which holds configuration information about all domains. A domain can read its own configuration information from xenstore usingxenstore-ls, Which is part of the xen utils.

Instance Storage

Instance Storage appears as 3 partitions to the VM: sda1 for root, sda2 for extra storage space (/mnt), and sda3 for swap. typically the backend of these virtual block devices are based on loopback devices and/or LVM logical volumes. logical volumes are considered to have a better performance and reliability compared to loopback devices. therefore I was surprised that sda1 is using a loopback backend As notednode = "/dev/loop13"In the xenstore vbd entry. The actual file used by the loopback device isparams = "/mnt/instance_image_store_3/262768". The suffix "_ 3" of the directory is probably based on the local domain ID. the numerical filename of the image is not the same as the AMI, which is surprising. I also do not know if/mnt/instance_image_store_3Is actually a locally stored directory or if the images are mounted via e.g. NFS. the latter wocould make sense, because they only need to create a copy of the AMI image on the Image Storage server, maybe even using copy-on-write, and do not have to transfer the image over the network to the node.

The swap device is a LVM logical volume denotedparams = "/dev/VolGroupDomU/instance_swap_store_3"In xenstore. The extra storage space/mnt is also using a logical volume backend:params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_3". An interesting aspect is that they seem to use copy-on-write (COW) for this volume, but I don't know why.

Elastic Block Store

The characteristics of the Elastic Block Store (EBS) lead to the conclusion that it is probably established-based setup. I have guessed that they probably use iSCSI volumes exported to the nodes, and I was surprised that they use something different: Global Network block device (gnbd ). the backend device of a EBS is listed as something likeparams = "/dev/gnbd89". Of course I don't know how the block device exporter is designed and what kind of storage system they use on that side. some more information about gnbd can be found here, which cocould also lead to the conclusion that Amazon uses Redhat's cluster suite.

Networking

The networking setup of EC2 is quite unorthodox and I haven't figured it out completely yet. amazon uses a routed xen network setup with DHCP providing private IP addresses to the VMS. A traceroute will show the private IP address of the router in dom0, as well as the external IP address of dom0.furthermore, the network setup script is namedscript = "/etc/xen/scripts/ec2-vif-route-dhcpd"In xenstore. a vm only has one interface with a private IP address and we have to assume that EC2 uses Nat to translate the external IP address to the internal one.

On L2, they also seem to use NAT, because the MAC address of all incoming and outgoing packets isEF:FF:FF:FF:FF:FF. They are also preventing IP Spoofing and ARP attack oning, which suggests that they do filtering on each Virtual Interface in dom0 Based on the L2/L3 address of that particle VM. Furthermore,Security groupsOf EC2 are probably also implemented in a similar way. I wocould not be surprised if Amazon usesEbtablesAndIptablesIn their dom0.

Domain naming

Via xenstore one can also determine the name of a VM, which is something likedomain = "dom_32504936"In Amazon ec2. based on a limited sample of domain names, it seems that the suffix number is incremental. assuming that the domain name is unique throughout the entire lifetime of EC2, it wocould mean that Amazon has started 32.5MillionVMS in ec2. the difference between the domain name suffixes of two instances started 24 hours apart was round 82000, which cocould lead to the conclusion that in that time period around 82000 VMS were started (assuming the suffix is actually incremental ). it wocould be interesting to monitor the domain name suffixes and thereby tracking the utilization of EC2 over time, e.g. how many VMS are started in specific time frames.

Outlook

So far I am satisfied with the information about the storage setup, but I will definitely need to get a better understanding of the networking. I will keep this post updated when I will gain new insights. if anyone has more information about any components, please leave a comment.

 

Http://openfoo.org/blog/amazon_ec2_underlying_architecture.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.