The hero is sad "safe" on counting cloud computing security incidents

Source: Internet
Author: User
Keywords Amazon downtime impact
Tags accounts address book applications backup broke broke out business cloud

Since ancient times heroes sad beauty Guan. The remarkable feats of heroes are obvious to all in the eyes of the world, but they cannot pass the "beauty". And now, like this, Google, Amazon and Microsoft, such as the International IT giants, once how powerful, but in the face of cloud computing "security", it seems to be helpless. From the day of the Birth of cloud computing services, frequent bursts of security incidents, so that users have some suspicious heart more uneasy.

Just last month, Amazon, the cloud-computing services provider, broke out the biggest outage in prehistoric times. In the early hours of April 21, Amazon crashed at the cloud Computing Center in North Virginia State, causing some sites to be affected, including answering services Quora, news services Reddit, HootSuite and location tracking service Foursquare.

These sites are based on Amazon's cloud computing center. The Quora website was inaccessible in the UK in the Thursday morning and afternoon. The site is hosted entirely by Amazon's EC2 (Flexible cloud computing) service, just like Foursquare and many other sites.

Affected, the HootSuite Web site responded slowly, and the search service for the Reddit site was not available. Amazon is currently showing a decline in services, the Reddit website said. Amazon cloud service interruption lasted nearly 4 days, when the editor press, HootSuite, Reddit, FourSquare, Quora and other sites have basically returned to normal.

According to the analysis, Amazon's cloud computing status page currently shows fault in North Virginia State's cloud computing center. The Center provides services to many Web 2.0 companies. The outage occurred on the west coast of the United States about 1:40 A.M., the British daylight saving time 9:40 A.M., and since then there has been trouble.

Analysts say the North Virginia State Cloud Computing Center is one of many cloud computing centers that Amazon operates, and, as a general rule, the system's design applications will consider a central outage that does not disrupt other cloud computing centers and does not affect users who use that service.

This time, the Amazon Cloud Computing Center has not bypassed the North Virginia State Cloud Computing Center's failure to transfer the workload to many other cloud computing centers, it is doubtful. Server downtime, which is less serious in people's expectation. The simplest, two-machine hot standby, one server down, the other server in a short time to start, and will not affect the user's services. But this time around, Amazon's cloud computing hub, which has affected so many users ' normal cloud services and caused user Service disruptions, is Amazon's proud elastic cloud, a huge blow to the trust that cloud computing services have just built up.

Amazon's cloud services have returned to normal after an emergency rescue. However, this incident to the user's bad impact some far-reaching, the user shouted "injury".

Fortunately, Amazon's attitude is honest. On April 30, Amazon issued more than 5,700 letters of apology to users for the outage, claiming that Amazon already knows where vulnerabilities and design flaws are located, and it wants to improve the competitiveness of EC2 (Amazon Elasticcomputecloud Service) by fixing those vulnerabilities and flaws. Amazon has made some repairs and tweaks to EC2, and plans to expand its deployment over the next few weeks to improve all services to avoid similar incidents.

In terms of compensation, Amazon says it will provide 10-day service points (credit) to users affected by the failure, which will automatically recharge to the affected user accounts. However, there is no legal guarantee as to how to avoid similar incidents in the future.

It is understood that the Amazon cloud service interruption lasted nearly 4 days, but there is no legal violation of Amazon EC2 Service service level agreement (for short, SLA). Amazon's explanation is that the EBS and RDS services, rather than the EC2 service, have failed to comply with the service level agreement. And, for Amazon's proposed response to downtime--a multiple-point backup, it's just a technical specification that's not a contract guarantee. None of this seems to bring confidence to the users of cloud services.

On the face of it, the Amazon outage seems to have a perfect ending: the manufacturer promptly fixes the loophole, apologizes in writing, and damages the damage. However, the user's psychological fear of cloud services does not seem to be easy to recover, and in the future Amazon may be able to gradually repair its reputation for damage by not only technical but also institutional and legal assurances to users.

Enumerate frequently occurring cloud service events

In recent years, not only Amazon, but also other companies with competing cloud computing, such as Google and Microsoft, have been frequently "interrupted" by cloud services.

Event One: Google Gmail mailbox outbreak of global failure

Gmail is Google in 2004 April Fool's Day launched a free mail service, but since the launch of the service, the occasional "interruption" incident has become a widely discussed topic in the industry.

February 24, 2009, Google's Gmail e-mail broke out a global failure, service interruption time of up to 4 hours. Google explains the cause of the accident: at the time of the routine maintenance of data centers in Europe, some of the new code (which would attempt to focus geographically close data on everyone) had some side effects that caused the overload of another European data center, so that the ripple effect spread to other data center interfaces, eventually leading to global disconnection, Cause other data centers to not function.

After the incident in the past few days, Google announced that the incident, Google to companies, government agencies and other paid Googleappspremier edition customers to provide 15-day free service to compensate customers for the interruption of service losses, the total of 2.05 U.S. dollars per person.

Event two: Microsoft's cloud computing platform Azure stops running

March 17, 2009, Microsoft's cloud computing platform Azure stopped running for about 22 hours.

Although Microsoft did not give details of the cause of the failure, but the industry analysis, the azure platform of the outage and its central processing and storage equipment failures. An azure platform outage could spark a security concern for Microsoft customers over the Cloud's computer service platform and expose a huge risk to cloud computing.

However, Azure was still in the "prediction test" phase, so it was acceptable to see some similar problems. Early exposure to security issues seems to be a wake-up call to Microsoft's Azure team, where security is the most important aspect of the cloud computing platform.

In the 2010, the Azure platform was officially put into business and became one of the developer's favorite cloud platforms.

Event Three: Rackspace Cloud service outage

In June 2009, Rackspace suffered a severe cloud service outage. Power supply equipment trip, backup generator failure, many racks on the server downtime. The accident caused serious consequences.

To save the company's reputation, rackspace updated all the blogs and discussed the whole thing in detail. But the user is not willing to accept.

In November of that year, Rackspace again had a major service. In fact, its users have a complete opportunity to denounce the supplier after a service outage, but the user says "The accident is not a big deal." "It seems that Rackspace is not taking good luck, but continues to provide sufficient updates and quickly fix these errors."

Posterous, one of the founders of the blog service provider, Sachin Agarwal, published his own opinion after a service outage caused its business to go offline for 15-20 minutes. Agarwal is not angry about this, instead, he said Rackspace in this matter to do "very transparent" and deal with the problem is in place in time.

It seems that if there is no serious loss of data and the service recovers quickly, the user still has a pleasant experience to use. For the so-called "100% normal operation", most users do not seem to give up the supplier because of occasional minor accidents, just don't pile up the problem.

Event Four: Salesforce.com downtime

In January 2010, almost 68,000 salesforce.com users experienced at least 1 hours of downtime.

Salesforce.com a temporary paralysis of all services, including backups, due to a "system error" in its own data center. This also exposes the salesforce.com locking strategy: its PAAs platform, force.com, cannot be used outside of salesforce.com. So once the Salesforce.com is in trouble, force.com will also have problems. So the service takes a long time to break, and the problem becomes tricky.

The disruption to the service has not had much impact on the company, and its collaboration with VMware Vmforce in the spring, while Salesforce.com chief executive announced in the one months after the interruption of service that Salesforce.com was " The largest cloud computing enterprise.

The interruption caused people to question the salesfore.com's software lock-in behavior and bind the company's Force.com platform to Salesforce.com's own services. But in short, the incident is just one more reminder that 100% of reliable cloud computing services are not there yet.

Event Five: Terremark downtime events

In March 2010, VMware's partner Terremark seven hours of downtime, allowing many customers to begin questioning their enterprise-class Vcloud express service. The shutdown incident, almost Vcloud express the future of the ruin, the affected user said the failure caused by "connection lost." It was reported that the outage affected only 2% of the Terremark users, but caused the service paralysis of the affected users. In addition, the user is very dissatisfied with the way the supplier handles the matter.

Terremark official explanation is: "Terremark lost the connection caused the Miami Data center Vcloud Express service interruption." "The key question is how Terremark solves the emergency, the company doesn't have a clear plan, just blurs the user's warranty and updates the affected users." If a shipping provider wants to convince business users to use their services at a critical moment, this is not the way to go.

John Kinsella, founder of Terremark's corporate client, protected team, said the supplier was a "grocery store custodian" when he complained about the interruption of service that discouraged him. Kinsella compares Terremark with Amazon, complaining that Terremark began to consider the use of status reports and service alerts Amazon had already achieved.

Of course, after the hype over Vcloud director and the excitement of the VMworld 2010, the Terremark service outage seems to have left only a small fallout.

Event Six: Intuit service interruption due to power outage

The company was baffled by the collapse of Intuit's online billing and development services in June 2010. Online products, including Intuit's own home page, have been paralysed for nearly two days, and users have been amazed at the extent to which such a wide range of service outages have occurred in the current era of complete backup scenarios and disaster recovery tools.

But that's the beginning. About 1 months later, Intuit's QuickBooks online service paralyzed after a power outage. This particular service outage lasted only a few hours, but the downtime that occurred in such a short period of time was also a cause for concern.

Even if some users demand "arming" their brands, Intuit still has 4 million users and continues on the way to PAAs and Web service providers. The company did not have the visibility of Amazon and Rackspace, and the interruption did not have much impact. Intuit is famous mainly for Quicken.

Event Seven: Microsoft Outbreak BPOs Service interruption event

In September 2010, Microsoft apologized to users for at least three instances of managed service outages in the western United States. This is the first time Microsoft has burst into a major cloud computing event.

At the time of the accident, when the user visited the BPOs (Business productivity Online Suite) service, the customer who used the Microsoft North American Facilities Access Service may have encountered a problem that lasted two hours. Although later, Microsoft engineers claimed to have solved the problem, but did not solve the underlying problem, resulting in the September 3 and September 7 service disruption.

Microsoft's Clint Patterson said the data breakthrough was caused by an uncertain set of Microsoft's data centers in the US, Europe and Asia. The offline Address Book in the BPOs software is provided to unauthorized users in "very special cases". This address book contains contact information for the enterprise.

Microsoft says the mistake was fixed two hours after it was discovered. Microsoft says it has a tracking facility that allows it to get in touch with people who have mistakenly downloaded the data to clean up the data.

Microsoft's series of events has raised concerns among those who once considered using cloud computing, especially those who consider using Microsoft's main cloud computing product, Office 365, bundled with Office suite software. It can be seen that even the famous Microsoft Company, faced with the provision of public cloud services security issues, also seems to be helpless. So it's hard to believe that the industry process 2011 will be the year of cloud computing applications.

Event Eight: Google mailbox again broke out large-scale user data leakage events

March 2011, Google's mailbox again broke out large-scale user data leakage incident, about 150,000 Gmail users found their own in Sunday all the mail and chat records were deleted, some users found that their accounts were reset, Google said the user affected by the problem is about 0.08% of the total number of users.

Google's Google Apps status Page said: "Some users of the Google Mail Service has been restored, we will be in the near future to come up with all users of the solution." It also reminds affected users that some users may be temporarily unable to log on to the mailbox service during the repair account. "

Google has failed in the past, but the entire account disappears for the first time. There was a 2.5-hour service stoppage in 2009, when many people complained to Google about the need for the system to work. Successive errors make it impossible for users worldwide to receive e-mail for hours. Google and Microsoft and other technology companies in recent years to develop cloud computing, hoping to attract corporate customers, but the cloud storage many accidents, fear of cracking down on user confidence.

Event Nine: Amazon Cloud data Center Server extensive downtime

April 22, 2011, Amazon Cloud Data center Server widespread downtime, the incident is considered Amazon's history of the most serious cloud security incident.

Some sites have been affected by Amazon's cloud computing hub in North Virginia State, including answering services Quora, news services Reddit, HootSuite and location tracking service Foursquare.

April 30, in response to last week's cloud service outage, Amazon published a 5700-word report on its website in Friday, explaining the cause of the failure in detail and apologizing to the user. Amazon also said it would provide 10-day service points (credit) to users affected by the failure and would automatically recharge to the affected user accounts.

In its Friday report, Amazon pointed out that companies already know where vulnerabilities and design flaws are, and it wants to improve the competitiveness of EC2 (Amazon Elasticcomputecloud Service) by fixing those vulnerabilities and flaws. Amazon has made some repairs and tweaks to EC2, and plans to expand its deployment over the next few weeks to improve all services to avoid similar incidents.

This event also raises concerns about moving its infrastructure to the cloud: relying entirely on third parties to report the availability of an application.

User psychology to bear the bottom line is broken?

As early as May 2010, Accenture and the China Electronics Society published a report called "A pragmatic road to the development of cloud computing in China". The report notes that security issues are the biggest global challenge to cloud computing. Such concerns are particularly pronounced in China, "so much so that CIOs are treading on eggshells, especially when it comes to public cloud services."

Cloud security has always been a headache for governments and businesses around the world, and if it can cross this hurdle, cloud services can be successfully applied to a wide range of applications, while the reverse is stalled. So it is possible to conclude that Amazon's downtime will make it more difficult to promote cloud services across the globe, especially in China. Here, many companies and governments in the country are more convinced of the security of private clouds.

Downtime causes people to think more about the security problems faced by the public cloud. Although the public cloud has a well-known cost advantage, users have to beware of the security, compliance, and quality of service they have. Now that the data is hosted by a third party, the customer wants the service provider to ensure data security, neither loss nor unauthorized access, compliance with regulatory requirements for storage-system and data-saving locations, and the provision of low latency, highly available services over the network.

A wave of the first "crab-eating" enterprise or Government CIO, it seems to be the eyes of people "suicide".

However, if only from these cloud services downtime events, it comes to the conclusion: cloud computing is useless, should not be promoted! This seems a little too arbitrary. Security incidents, and not just cloud computing patents, any IT system will suffer from security pressures, whether from natural disasters or man-made disasters.

It can be said that not only cloud services, basic Internet services are not escape the "security" problem. Since it cannot be avoided, the service provider can only face it bravely. In addition to the cloud services provider to improve the architecture and technology, but also to continuously provide users with service security system.

Once interviewed a domestic cloud service provider vice president, he jokingly said: "Very want to provide the company's cloud services on an insurance, so that the user data loss, service problems, can give users real compensation." ”

As the saying goes: human money for people to eliminate the disaster! Many cloud service providers are also actively seeking a service guarantee system that users can trust. It is only a matter of time before the process can be perfected, and how much progress may not be seen in 2011.

A series of downtime events, we can not help but ask, user psychology to bear the bottom line is broken? The answer is no. At present, even if there are serious downtime events, some foreign users are still keen to use the public cloud services of companies such as Amazon, and they see such events as accidental, like a plane crash, more secure than a car trip or a plane trip.

Compared to the cost of IT systems for fast-growing enterprises, some companies are more willing to try innovative it technologies and services in exchange for faster enterprise development, and they are already coping strategies or willing to take the risk even when data is lost and leaked. So, although cloud computing services still have a lot of uncertainties (natural disasters, man-made disasters), but still undeniable its revolutionary and innovative, it satisfies a part of "light" company's development needs. With its "security" this last hurdle, will usher in large-scale development.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.