Qualitative and quantitative research on the availability of cloud computing (5)

Source: Internet
Author: User
Keywords nbsp; downtime cause HTTP cause

Author: Chen Whilin, China Cloud Network technical advisor Bending Review founder North Aurora Venture capital investment Advisor

4 Case Study-Amazon AWS

4.2 Amazon AWS Service Outage Survey (2006-2009)

Amazon AWS has experienced many service crashes since its March 2006 14 open S3 file storage service and the August 25, 2006 EC2 Service, and the August 2008 EBS service. These include EC2, S3 and EBS. Its impact is related to the many important Internet companies that hire its services.

AWS on February 16, 2008, AWS's S3 had a severe service outage and caused a disruption of service for many AWS users. Amazon's AWS team had a deep rethink and, on April 8, began offering AWS service Tiyatien Dashboard, which tracked the reliability of various services every day.

This section attempts to organize a list of major downtime events on the AWS Line and discuss them accordingly.

1 Apri 1, 2006

Amazon has been opening its S3 storage service for less than one months, and on April 1, 2006 S3 crashed.

Cause of the accident: S3

Accident Recovery: 6 hours

Incident explanation: AWS teams do S3 storage for load-balanced management. The result is an internal network load crash that makes the S3 subsystem service downtime.

Related url:https://forums.aws.amazon.com/thread.jspa?threadid=10185

2. Sept 29. 2007

Amazon's EC2 has crashed, and some customers have lost data. EC2 API management functions are briefly discontinued.

Cause of the accident: EC2

Accident Recovery: 4 hours

Accident Explanation:

Related url:https://forums.aws.amazon.com/thread.jspa?threadid=17211&start=0&tstart=0

Amazon's AWS team explained that some of the customer's virtual machines were accidentally killed by some of AWS's management software errors. At the time, to ensure the security of the entire AWS Service, the AWS team quickly halted the EC2 Management API feature.

3. Modified 15, 2008

February 15, 08 is the first major incident that Amazon officially acknowledged and explained to the public. It also influences the knowledge and vigilance of the industry on the reliability of the common cloud. and led directly to Amazon's decision to strengthen the regulatory and transparency of service availability.

Cause of the accident: S3

Accident recovery: 3 hours

Accident Explanation:

The authentication (authentication) service of the S3 service subsystem is unable to withstand a sudden large area of service requests, resulting in the S3 system being paralyzed. The official interpretation of AWS can be found in:

http://www.zdnet.com/blog/btl/amazon-explains-its-s3-outage/8010

After this major outage, the AWS team promised the industry to make "service Tiyatien Dashboard", which transparently enables users to understand the various service conditions in AWS.

4. June 5, 2008

June 5, 08, Amazon in the Eastern Virginia data center to find lightning shock. Some of the EC2 services in the area are down.

Cause of the accident: Thunder and Lightning

Accident recovery: N/A

Accident Explanation:

Thunder and lightning led to the loss of power in eastern Virginia's data centers. Causing EC2 downtime.

Related url:http://www.datacenterknowledge.com/archives/2008/06/05/brief-outage-for-amazon-web-services/

5. June 6, 2008

On the June 6, 08, Amazon's own online retail business suddenly crashed. Mainly in the United States and the United Kingdom business. But AWS itself did not appear to be abnormal.

Cause of the accident: Amazon did not make any official explanation for the accident. It's just an informal explanation for "Amazon's network system is very complex." A little something is very accidental and normal ...

Accident recovery: 3 hours

Accident Explanation:

As Amazon did not make a formal explanation of the incident, the industry's guess was that Amazon's load-balancing business, such as the DNS service, was having problems. Another argument is that Amazon was attacked by a malicious DDoS attack from a Trojan horse. The evidence is that Amazon's IMDB site (http://www.imdb.com) is being DDoS-amplified by traffic and layer 7 while Amazon's main site is down. The attack flow is probably 3mbits/sec. The following figure is the downtime of Amazon's US and UK sites on that day.

6. July 20, 2008

08 July 20, S3 Another major downtime accident. Many important customers are affected, such as twitter.twitter all images are basically stored in Amazon's S3 system.

Cause of the accident: S3

Accident Recovery: 8 hours

Incident Explanation: The S3 server controls the flow of information between the servers, causing the S3 server to be unable to process any user's service requests. Amazon also acknowledges that EC2 's services have been affected. Some customers have virtual machines that are not running. In addition, the Simple Queue Service (SQS) services are also impacted and interrupted.

The official explanation of AWS is: http://status.aws.amazon.com/s3-20080720.html

7. June 10, 2009

On the June 10 of 09, there was a major downtime in EC2, AWS. The reason is that the data center was subjected to lightning shocks and lost power.

Cause of the accident: EC2

Accident Recovery: 8 hours

Accident Explanation:

Natural climate, lightning causes the data center to lose power.

Related url:http://www.datacenterknowledge.com/archives/2009/06/11/lightning-strike-triggers-amazon-ec2-outage/

8 July 19, 2009

On the July 19 of 09, AWS had EC2 performance and downtime.

Cause of the accident: EC2

Accident Recovery: 2 hours

Accident explanation: N/A

Related url:http://www.datacenterknowledge.com/archives/2009/07/19/outage-for-amazon-web-services/

9. Oct 5, 2009

On the October 5 of 09, BitBucket, an online Open-source project service Company, crashed 19 hours on the AWS business.

Cause of the accident: EC2, EBS

Accident recovery: 19 hours

Accident Explanation:

BitBucket services on AWS were hacked by hackers using traffic-attack methods. The first thing to use is UDP flooding. Then convert to TCP flooding. The service paused for 19 hours. AWS's operational team has demonstrated a lack of experience in the process.

Related url:http://www.networkworld.com/community/node/45891

Dec 10, 2009

On the December 10 of 09, AWS EC2 occurred in a downtime accident. The reason is that the data center was subjected to lightning shocks and lost power. The location takes place in the data center of the eastern North Virginia

Cause of the accident: EC2

Accident recovery: 45 minutes

Accident Explanation:

Natural climate, lightning causes the data center to lose power.

Related url:http://www.datacenterknowledge.com/archives/2009/12/10/power-outage-for-amazon-data-center/

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.