The worst cloud-breaking accident to date

Source: Internet
Author: User
Keywords Affect this malfunction

2013 's worst cloud computing outage

China's IDC Circle July 23 reported: cloud Computing for enterprises and ordinary users have a lot of benefits, although the cloud is located in the "sky", but they are not spared the "earthly mistakes." Cloud computing users know that, like any other type of technology, web-based services can crash. If the supplier behind these services is smart enough, you should not lose any data, but you may be severely affected during the interruption of the service. Let's take a look at the worst cloud computing outage in 2013 so far.

Amazon homepage Failure

Date: January 31, 2013; time: 49 minutes; Amazon's cloud computing service also had major outages before, but we rarely saw the company's own Amazon.com home page fail. Earlier this year, we saw the accident: in the originally quiet January day, the Amazon.com page displayed a text error message for up to an hour. Judging from this message, "http/1.1 service is not available", we can't tell what actually happened. Some argue that this may be a denial of service attack, but these statements seem suspicious. Although Amazon has never formally commented on the incident, subsequent reports suggest that the culprit is likely to be an internal problem.

Amazon Accident Impact

Online retailers such as Amazon must ensure the online presence to ensure that the business is working properly. Judging from the company's previous quarterly earnings, some industry watchers estimate that an hour of offline time could have missed nearly 5 million dollars in revenue. Amazon did not say how they managed to get the business back on track, but only that the failure affected its home page without affecting the inner pages and had no effect on its AWS cloud hosting operations.

Dropbox service Interruption

Date: January 10, 2013; Time: About 16 hours; The main selling point of Dropbox service is that you can think of it as your local hard drive, so when the service is unavailable for a whole day, the consequences will be disastrous. This happened January 10 this year: 3:30,dropbox acknowledged the failure of its services around the Pacific afternoon, and the company told customers through Twitter that all client synchronization and file uploads would not be available for "the next Hour". It was not until 7:09 the next day that the problem was solved.

Dropbox accident Impact

In the face of the accident, users who use Dropbox to meet their file storage needs are very disappointed, Dropbox users express their dissatisfaction on Twitter. One user said: "Dropbox crashes, and users begin to realize that they cannot trust cloud services 100%." "Dropbox did not disclose the exact cause of the accident, but Amazon issued a statement claiming that the incident had nothing to do with Amazon's cloud computing services."

Facebook site Interrupted

Date: January 28, 2013. Time: hours; on the morning of January 28, Facebook users around the world found they could not update their friends ' status messages. A large number of users often visit Facebook sites, so hours of downtime cannot be found. Earlier this month, Anonymous, a hacker group, released a video claiming it was attacking Facebook and interrupting Facebook on the same day. What the hell happened?

Facebook incident Impact

In hours, people can't get status updates for their friends. Facebook said the outage originated from a DNS problem that "prevented users who entered facebook.com in the browser from accessing the site", a problem that was easy to solve, and there was no indication that Anonymous was involved in the event. The accident only affected Facebook's desktop Web site, and its mobile sites and apps were unaffected.

Microsoft service interrupted, first wave

Date: February 2013 1-2; time: About two hours; for Microsoft, February was a tough month. On February 1, the company's Office 365 editing suite and outlook.com Mail Service were interrupted, and users were unable to access both services for about two hours. A day later, Microsoft's Bing search engine also suffered nearly two hours of downtime, what should we do? Of course it's a switch to Google.

Microsoft Incident Impact

For Office 365 and outlook.com failures, user forums and social media sites are filled with complaints from users. And for the Bing problem, users who rely on Bing must be very disappointed. According to Microsoft, the outage was the cause of "routine maintenance errors". More specifically, the root cause of this problem is "set network configuration changes" that can mitigate the impact of the incident by deploying "necessary fixes".

Microsoft service interrupted, second wave

Date: February 22, 2013; Time: more than 12 hours; compared with the second interruption, the first time is nothing. On the evening of February 22, the company's Windows Azure cloud storage service was interrupted, and all secure access timing output features were unavailable. Other Microsoft services (such as Xbox Live, Xbox Music, and Xbox video) are also starting to go wrong, with users not having access to cloud-connected data or using any of the multimedia content bundled into these services.

Microsoft Incident Impact

Forums and social media sites are once again filled with complaints from customers. Microsoft has revealed that expired SSL certificates are the root cause of this failure (really?!) )。 Two interruptions have been a real headache.

Google Drive

Date: March 2013 18-19th, time: approx. 17 hours; On Monday, March 18, many users were slow to load or timeout when they tried to access their drive documents and files, which lasted about three hours. A day later, the second Google Drive outage left some users unable to access the service for approximately two hours. These two days later, drive again for 12 hours, which really annoyed the user.

Google Drive Accident Impact

As you can imagine, forums and social networking sites are a variety of complaints. Google said the initial problem was related to the failure of the company's network control software. The system obviously has no load balancing, causing unnecessary delays to the company's servers. This in turn leads to problems with the drive connection management system. Google is committed to fixing the vulnerability, adjusting its load-balancing settings and ensuring "greater isolation" between its network services. The company also tweaked its drive software to make the service "more resilient" in the face of delays and recoveries.

Cloudfare website crashes

Date: March 3, 2013; Duration: About one hours; Cloudfare's business focused on helping customers protect and accelerate the site, but on the morning of March 3, the company's own website and all of its services failed, causing 785,000 other sites to crash, including WikiLeaks , 4chan and some government websites.

Cloudfare Accident Impact

In about an hour, when you try to access any Cloudfare connected site, you get a "can't route to host" error message. The Cloudfare company claims that the system failure of the edge routers (connecting the Cloudfare system to the Internet) is the main cause of the accident. Although the crash of several routers usually leads to traffic transfers, in this case a vulnerability can take each router offline. The engineer discovers the problematic code, clears the code, and then waits for all the routers to be restarted in 23 data centers in 14 different countries.

Dropbox failed again

Date: May 30, 2013; Time: About 90 minutes; after five months of normal operation, Dropbox at the end of May. This time, the service is interrupted for about 90 minutes, giving customers no access to their files or uploading any new material.

Dropbox accident Impact

After a 16-hour outage in January, people seem to be able to accept the fact that the service has been down again. Luckily, the accident didn't last long. In the face of the second failure of 2013, Dropbox more calm than the previous time, only to indicate that its services have returned to normal, and to apologize for any inconvenience caused.

Twitter service outage

Date: June 3, 2013; Time: approx. 45 min June 3, Twiter users cannot access the service to send or read content. After about 25 minutes, the service recovered, but it was still slow.

Twitter incident impact

In times when Twitter is unavailable, Google + may have peaked, and everyone is asking others if Twitter is available. Twitter says an error has occurred in the "daily changes" that sent fail whale to the site. After the engineer identified the problem, the wrong change was canceled and the service quickly returned to normal. (邹铮 compiled) Thanks for watching! Hope that the interruption of the accident is getting smaller.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.