With the development of cloud computing, cloud computing technology and emergency response mechanisms are becoming more mature and perfect. In most cases, the cloud computing platform can continue to operate normally and steadily.
However, due to weather or other reasons, cloud computing vendors, large and small, around the world have indeed had many "famous" faults in the past few years.
1. In June 2009, the Amazon EC2 service was down for 5 hours due to distributed denial of service (DDoS) attacks.
2. In June 2009, Rackspace tripped due to power supply equipment, and the backup generator failed, resulting in a large-scale shutdown of the server.
3. In May 2010, within a week, the Amazon Virginia Data Center staged three different downtimes. The first time the uninterruptible power supply (UPS) failed to switch to backup power, the server of one entire rack was shut down. The second time occurred four days later, due to a short circuit in the power distribution box, the service was interrupted for 8 hours. Two days later, a car hit the pole and cut off power to the data center, causing a half-hour crash.
4. On April 22, 2011, due to technical reasons, Amazon's many services in the eastern United States were interrupted. This failure lasted for about four days and is considered to be the most serious cloud computing security incident in Amazon's history.
5. On February 28, 2012, due to the “Year of the Year”, Microsoft Azure had a large-scale service interruption on a global scale, with an interruption time of more than 24 hours.
6. On August 18, 2014, after the release of the Windows 8.01 security patch, Azure Cloud caused some user interruption services for up to 5 hours due to technical issues. Microsoft reported that Azure services such as virtual machine sites, automation, backup and site recovery were disrupted in multiple locations.
7. In November 2014, Azure's major Region's storage services had problems, causing 11 hours of failure. The failure affected 19 Azure services involving 12 Regions, and it seemed that only the Australian data center survived.
8. At 12 noon on November 2, 2014, Tencent Cloud's servers in Shanghai and Guangzhou were faulty, causing users who use the server to fail to log in properly and the connection to be unstable. The fault lasts for about two hours.
9. On June 6, 2015, Qingyun's service provider Ruijiang Technology Room caused power failure due to thunderstorm weather, which caused all hardware devices in Qingyun Guangdong 1 to be unexpectedly shut down and restarted. Qingyun official website and console could not access and deploy to GD1 user service. unavailable.
10. On July 6, 2016, at 10:22 am, Alibaba Cloud North 2 area available area A was affected by network equipment abnormalities, which caused some product access to be affected. The fault lasts for about 1 hour.
Conclusion | Write to all cloud computing users
When companies trust their IT infrastructure to cloud providers, don't forget that you are the owner of these systems.
Mike Elgan, a senior technology journalist from eWeek, a well-known computer weekly magazine in the United States, once said: "Cloud computing is not a panacea. We are just renting someone else's computer. So the problems that may arise in our own data center are still turning to cloud computing." He suggested that "it is important for companies to have their own alternatives."
Netflix's technicians believe that every system must survive on its own, no matter what the circumstances. Therefore, they have designed the system to take into account the failures of other systems on which they depend and can tolerate failures.
From the two centers of the two places to the same city to live in different places, financial institutions, governments, large and medium-sized enterprises have always adhered to the idea of "do not put eggs in a basket", and the more sophisticated and sophisticated on the road to disaster recovery.
But behind the high-end, it is a high cost, which is too burdensome for SMEs. In fact, there are corresponding services in cloud computing: Region and Availability Zone (AZ: Availabe Zone). So for your own cloud business, you can spread to multiple Availability Zones and spread across multiple geographies.
In addition, the Internet community has some basic consensus on how to achieve high availability, such as: large system, service splitting; concurrency control, service isolation; grayscale release; comprehensive monitoring and alarm; core service, smoothing Downgrade. These best practices, if implemented well, are very helpful in improving system availability.