When it comes to cloud computing, users and vendors are still exploring uncharted territory when they adopt the new infrastructure. Failure problems and unexpected downtime will inevitably occur. Even the biggest and best cloud makers are in vain before their services are occasionally "stopped".
So what went wrong in the event of a cloud outage? What can IT managers and users learn from every event? To help readers better run their services, some downtime events are sorted according to the severity of the downtime incident.
Microsoft
Even during the testing period, cloud services are experiencing unexpected downtime. That's what happened to Microsoft in March 2009, when Azure crashed for 22 hours. Only tested applications are affected during the probation period, so there is no significant loss.
Azure downtime was quite early in the evolution of cloud computing, but IT managers already know that disaster preparedness and downtime planning in the cloud is a wise first step. However, Azure is in its infancy, and no one knows how much it will affect cloud computing or how it will affect the confidence of people in cloud computing.
Severity: Low
Pacific Time February 28, 2012, 5:45, many of the world's Azure users experience the Azure service management function power outage. For most users, power outages do not affect their service delivery, and they can manage them. Microsoft has stated that most customers have managed functional fixes within 12 hours of a leap year 2012 years of downtime, although for many users, it has been a smooth 24-hour period after the power outage began.
Severity: Low
Rackspace
In June 2009, host-managed turned into cloud maker Rackspace experienced a serious outage, when power trip, a series of generator backup failed, most server rack quiet, this is not alarmist!
For the company's credibility, Rackspace reported the entire event on its official blog and had a story on Twitter, but the critic still rackspacefail #的标签满天飞.
Severity: High
But in November 2009 Rackspace Another serious outage, and bad responses were not flying. In fact, Rackspace's customers have the opportunity to seriously hurt the manufacturer's downtime, but they describe it as "no big deal." "This means that Rackspace manages continues to provide the right upgrades and quick fixes.
The customer said that after 15-20 minutes of offline business, Rackspace was very transparent and dealt with the problem quickly. This incident, the company brought a guarantee, but also resolved its public relations crisis. If there is no important data loss, the service can quickly recover, customer satisfaction or satisfaction. In fact, those who are full of "100% uptime" manufacturers, most customers do not seem to be due to an accidental accident and give up.
Severity: Low
Salesforce.com
In January 2010, Salesforce.com's 68,000 customers suffered at least one hours of downtime.
The company reported "system failure" in its data center, and everything, including backup, was off the menu during this period. This incurs some negative attention, Salesforce.com's locking strategy force.com become the target of criticism, this is a kind of platform namely service (PaaS) product, cannot use outside salesforce.com. So when Salesforce.com has a problem, force.com is dead.
Although this downtime did not cause much harm to the company, its vmforce cooperation with VMware in the spring of childhood, Markbeniov in less than one months after the downtime, but also Kua yarn Salesforce.com "is the largest enterprise cloud computing company." "They don't seem to care about that.
Severity: Medium
Heroku
Heroku is a PAAs enterprise that serves the Ruby programming language, estimated to have about 44,000 applications installed on it, and January 2010, the 20,000-dollar high capacity Amazon EC2 instance hangs on this.
Amazon has made these instances "resurrected" within an hour, but Heroku product developers have been hit. Heroku runs all its instances in a single usability domain, which leads to their major complete service outages and the lack of cloud computing best practices, which means that such downtime hinders its continued development.
Heroku drank a pot in this way, and they thought the event was "the highest order" when dealing with cloud services.
Severity: High
Terremark
Looking back to March, VMware partner Terremark, after seven hours of downtime, put the future of Vcloud Express in jeopardy, and this incident caused connectivity. It was reported that only 2% of the customers were affected, but those who received the impact expressed extremely strong dissatisfaction with how the firm handled the matter.
A spokesman for the Terremark said the company was a "mum" hosting company when the client growled. The most powerful thing is that he actually compared Terremark's response to Amazon, which is simply to tell the customer, in the struggle to choose who, the status report and service alerts are counted in it ...
Of course, Vcloud director for a while, VMworld 2010 of the excitement also receded, Terremark downtime does not seem to leave much to the mark.
Severity: Medium
Amazon
It seems that all other cloud computing outages are pediatrics compared to Amazon's Web services downtime. So the originator of cloud services vendors, Amazon in the past few years suffered a service interruption and the actual disaster evenly distributed.
A rare accident in June 2009 left some customers with Amazon EC2 Service for 5 hours, but most customers viewed it as a growing pain. This somewhat odd response has not been sustained, with Amazon's disaster-response coordination and customer relationships beginning to be lost after a distributed Denial-of-service attack and lengthy e-mail controls.
Severity: High
Another incident involving Amazon's data center in Virginia, which led to a thunderstorm that caused the system to go down for 6 hours, has also shown the company's development from one side, and Amazon's response time is worth affirming.
Severity: Medium
As cloud computing continues to grow and expand, problems follow. In the May, some seemingly unrelated incidents were staged in the Amazon Virginia Data center, causing three of different outages over a one-week span. The first time the uninterruptible power Supply (UPS) to the backup power failure, a rack of the server hung; the second occurred after four days, a short circuit of a power distribution box, resulting in a service interruption of 8 hours. The last two days, a car hit a telegraph pole, cut off the power of the data center, resulting in half an hour of downtime. Regardless of the relationship, it is not a big event, such a short period of time this three downtime for any manufacturer can not be a trivial matter.
Severity: High
Interestingly, most customers seem to have an open mind about Amazon Web services. They accept the complexity of Amazon technology and the potential for unexpected problems, most importantly, they agree with the Amazon cloud environment reasonable prices, providing the value of the work they want.
Amazon has also lived up to its customers ' expectations, continued to go down, and, of course, showed the maturity of the perfect price, and made a quick response to the April 2010 downtime. An ultra long blog post, AWS status pages are regularly updated, and a newsletter reports the reasons behind the downtime and how to resolve them.
Severity: Medium
Some sites were affected in April 2011, as Amazon's cloud computing center in North Virginia State (a blessed place) crashed, including answering services Quora, news services Reddit, HootSuite and location tracking service Foursquare.
Surprisingly, Amazon cloud services have been disrupted for nearly 4 days without violating the service level agreement (SLA) of Amazon EC2 Services. Amazon FAQ explains, "It ensures that a region has 99.95% service utilization during the 365-day service period." This time, several affected users complained that Amazon did not disclose the latest information in time for the interruption of service. Degenerate?
Severity: High
Summary
As most cloud users notice these events, such downtime occurs frequently in enterprise data centers. What we have listed is not complete and other content can be referred to other reports. Cloud computing is not perfect, and more downtime continues to occur. All the top vendors can do is learn where to go wrong and fix the problems so that some dark-horse companies usurp the leader of their cloud makers through better track records.
(Responsible editor: The good of the Legacy)