Computing Service of Windows azure provides 99.95% High Availability support. That is to say, applications deployed on Windows azure will not go down. However, the premise is that at least two instances are set for each role deployed to Windows azure. The Windows azure platform monitors the running status of virtual machines at any time. When a fault occurs, it will use the hot Migration Technology to migrate a service to a running physical server in a very short time. At the same time, when a user updates the service or the platform updates the guest OS, Windows azure will also ensure that the service will not be suspended. How can we achieve this?
Implementation of high availability on Windows azure Platform
The Windows azure platform provides the concept of two runtime domains for all the role instances running on them, namely the fault domain and the upgrade domain ). The high availability of computing services is guaranteed by these two domains.
For instances in the same fault domain, they are likely to have faults at the same time. In the physical architecture, a fault domain can be a physical server in a Windows azure computing node, or a rack that carries multiple servers. Windows azure controller fabric ensures that every role deployed on the Windows azure platform is allocated to at least two fault domains as long as the number of instances it sets is greater than 1. At the same time, because the number of servers in the Windows azure data center is very large, as long as the number of user instances is greater than 1, there will be almost no simultaneous failure of all instances. The Windows azure platform is designed based on the fault domain to meet its high availability requirements.
Similar to the fault domain, resources (servers, virtual machines, etc.) in the same update domain on the Windows azure platform will be updated at the same time. The Windows azure platform also ensures that if the number of instances is greater than 1, they will not be allocated in the same update domain. That is to say, when our role is updated, whether it is through the update method on the developer portal or the guest OS update executed on the Windows azure platform, it does not affect all instances of a role at the same time. Because the deployment on the server is unavailable during server updates, based on this design, a role will not be unavailable due to updates, because at least one instance is in another update domain and no update operation is in progress.
Demonstrate the concepts of fault domains and newer domains. Assume that the current Windows azure project has two role instances, Web role and worker role, and they have four instances respectively. Therefore, during deployment, the Windows azure platform will allocate four computing nodes, and ensure that each role instance is allocated at least in two fault domains, that is, two server racks.
At the same time, each instance is allocated in different update domains. In such a allocation condition, if the fault domain 0-v2 has a problem, because the fault domain 1-V2 also contains the Web role and worker role instance, this ensures the normal operation of the entire system. Similarly, if you need to update the hosted service, the platform will first update the update domain, that is, the instances of two web role and the instances of two worker roler cannot work, however, instances in the domain 0 can still work as usual. The platform will update domain 1 only after the instance of domain 0 is updated. At this time, the system can use the updated domain 0. In this way, no matter what the situation, the provided services can be guaranteed to be uninterrupted.
Reference: detailed technical explanation of Windows azure Microsoft cloud computing platform Xu Ziyan Electronic Industry Press