Cloud infrastructure based on untrusted server nodes
Now many enterprises have begun to provide users with cloud computing services, a practical cloud computing infrastructure needs to address a number of key technologies, cloud computing era it technicians need to coordinate the largest ever server farm, and ensure that the entire system continues to function continuously. Cloud computing era users of a large number of critical data, key business and critical applications were moved to the cloud, so the system security, high availability, high performance to the cloud computing system is the basic requirements.
Today's cloud computing enterprises, because cloud computing platform technology is a complex system, technology involves a wide range of companies engaged in cloud computing research and development of the technical strength and financial strength of a high demand, which can really at this stage of the cloud computing Platform research and development enterprise is not too much, At this stage, a large number of companies claiming to be able to provide cloud computing services are more of their original technology and resources of similar cloud-like parts of the cloud: such as host leasing, network storage.
To achieve the coordination of tens of thousands of even millions of servers, and to provide developers with a rich API to meet the needs of ordinary users and enterprise-level users is a very difficult problem, so a practical cloud computing system is a complex systems engineering, not even a business can be completed independently, It requires server providers, storage device providers, system platform providers, network device providers, network bandwidth providers, and application developers to work together.
From Google's experience and the size of the future server cluster, the server failure as a cloud computing system of the server model is in line with the actual situation, in this case, a single server can be considered as untrusted node, in the system design must be the failure of untrusted server nodes in the system, cannot be delivered to developers and ordinary users.
This chapter will introduce a model of cloud computing infrastructure with an untrusted node model, and we do not expect this model to be perfect, just hoping to provide readers with a way to build a system.
Application Scenarios for cloud computing infrastructure
We want to design the cloud computing infrastructure first of all to confirm the application of this system scenario, the application of the scene confirmed the actual implementation of the cloud computing system to confirm the overall requirements. A practical cloud computing infrastructure must be a real-world architecture, the life of cloud computing is user-oriented, user-centric, and the cloud infrastructure in this chapter is designed with the actual cloud computing center as a scenario, which includes the following main points.
(1) The size of the number of nodes is very large, the failure probability of a single node is relatively large, so that the entire system has a node failure probability is quite large.
This scene requires the system to be able to effectively monitor and coordinate all the nodes, timely alarm the failure of the node, and report the details of the fault to the management node, make the corresponding data and compute the migration operation to ensure the continuous operation of the system. Because of the huge number of nodes in the system, the probability of the failure nodes in the system is very large, as we've said before, if a node fails in a year with a failure rate of 0.1%, that is, once every 1 00, the probability of one node failure in a cluster with 10 000 nodes is 1. 0%, there are 1 000 000 nodes in a cluster in a year, the probability of a node failure is 100 0%, that is, a year will occur 1 000 node failure, this probability is relatively large, the average daily 3 nodes will be invalidated. In fact, a node in a year of failure probability of 0.1% is quite low, no server can run 1 000 years before the failure, if this value increase that every day hundreds of nodes will fail.
(2) Cloud computing centers may be integrated across multiple centers over a region.
Because cloud computing centers can be located in different places, coordination and communication between centers are issues that the system must consider, and because of the trans-Zone Cloud Computing Center provides a more advanced cross-zone data security assurance level for data storage, High-security data can provide data backups across the region, allowing for cross zone computing and storage migration in the event of a major node failure, and higher availability of the system.
(3) Users in the cloud computing system is to be engaged in data-intensive and computationally intensive work, rather than a single storage or computing work.
Many of the existing cloud computing systems do not implement data-intensive applications and compute-intensive applications while supporting and future cloud computing system in the face of diverse users, different needs of users will be in the same system operation and work, have engaged in scientific computing, have engaged in information mining, have engaged in simple office, have engaged in image processing, These applications have both data-intensive and computational-intensive, as well as computational and data-intensive, so cloud computing systems must be flexible in responding to this diverse demand.
(4) A cloud computing infrastructure should have the ability to adapt to a wide variety of applications, with individual users, enterprise-level users, and developers working on the system.
A large number of applications in the system to run, the system must be different applications, different users of effective hardware and software isolation, so as to ensure that these services can not be mutually affected, the data between different users can not cover each other.
(5) Non-cloud infrastructure designers and providers do not need to understand the hardware and software of any cloud computing center, they only need to use computing and storage resources on demand, cloud computing center for them is just an on demand resource pool.
Mapping an on-demand computing and storage resource pool to ordinary users is a core technology in cloud computing, and the virtualization approach is an important means of solving this problem.
The above scenario provides a corresponding requirement for the design of the cloud computing infrastructure, and must consider its adaptability to the application scenario when designing the cloud computing system. Of course, the scenario we provide is not a very comprehensive scenario, and a business-class product may need to consider more complex scenarios.