As a new computing model, cloud computing is still in its early stage of development. Many different sizes and types of providers provide their own cloud-based application services. This paper introduces three typical cloud computing implementations, such as Amazon, Google and IBM, to analyze the specific technology behind "cloud computing", to analyze the current cloud computing platform construction method and the application construction way.
Tsinghua University Chen Zheng People
Example 1:google cloud computing platform and its application
Google's cloud computing technology is actually tailored to Google's specific Web applications. In view of the large scale of the internal network data, Google proposes a set of infrastructure based on distributed parallel clustering, and utilizes the ability of software to deal with the frequent node failure problems in the cluster.
Since 2003, Google has published papers in the top-tier conferences and magazines in computer systems research for several years, revealing its internal distributed data-processing methods and demonstrating its core technologies for cloud computing. According to the paper published in recent years, Google's cloud-computing infrastructure model includes four separate, tightly knit systems. Including Google file system, which is built on top of the cluster, the Map/reduce programming model, the distributed locking mechanism chubby, and the simplified scale of the Google development model. Distributed Database BigTable.
Google File System filesystem
To meet Google's rapidly growing data processing needs, Google has designed and implemented the Google FileSystem (gfs,google file system). GFS has many of the same goals as the previous distributed file systems, such as performance, scalability, reliability, and availability. However, its design is also affected by Google's application load and technology environment. Mainly reflected in the following four aspects:
1. Node failure in a cluster is a normal, not an exception. Because the number of nodes involved in operation and processing is very large, it is common to use thousands of nodes to calculate together, so there will always be nodes in the failure state. A software program module is needed to monitor the dynamic health of the system, detect errors, and integrate fault tolerance and automated recovery systems into the system.
2. The file size in the Google system is different from the usual file size concept in the file system, and the file size is usually in G-byte. In addition, the file system in the file meaning is different from the usual file, a large file may contain a large number of the usual sense of small files. Therefore, design expectations and parameters, such as I/O operations and block sizes, should be reconsidered.
3. The file read and write mode in Google File system is different from the traditional file system. Changes to most files in Google Apps, such as search, do not overwrite the original data, but append new data at the end of the file. Random writes to a file are almost non-existent. For this type of large file access pattern, the client loses meaning to the block cache, and the append operation becomes performance optimization and atomicity (a transaction is considered a program.) It is either fully executed or not implemented at all, and the focus is guaranteed.
4. Some of the specific operations of the file system are no longer transparent and require the assistance of the application, and the collaborative design of the application and file system APIs enhances the flexibility of the system as a whole. For example, the requirements for the GFS consistency model are relaxed so that the design of the file system is greatly simplified without burdening the application. An atomic append operation is also introduced so that when multiple clients are appending at the same time, no additional synchronization is required.
In short, GFS is designed for the Google application itself. Google is said to have deployed many GFS clusters. Some clusters have more than 1000 storage nodes, more than 300T of hard disk space, and are constantly accessed by hundreds of clients on different machines.
Figure 1 below shows the system architecture of Google File Systems, a GFS cluster that contains a master server and multiple block servers that are accessed by multiple clients. The file is divided into blocks of fixed size. When each block is created, the server assigns it a invariant, globally unique 64-bit block handle to identify it. The block server saves the block as a Linux file on the local hard disk and reads and writes block data based on the specified block handle and byte range. To ensure reliability, each block is replicated to multiple block servers, and three backups are saved by default. The primary server manages the metadata for the file system, including the namespace, access control information, and the mapping information of the file to the block, and the current location of the block. The GFS client code is embedded in each program, it implements the Google File system API, helps the application communicate with the master server and block server, and reads and writes the data. The client interacts with the primary server for metadata operations, but all data operations are communicated directly to the block server. The client provides an access interface similar to the POSIX interface, but has some modifications that are not fully compliant with the POSIX standard. Through joint design of server-side and client, Google File system can achieve maximum performance and usability effect for its application.
MapReduce Distributed programming environment
In order for employees with internal distributed system orientation backgrounds to have the opportunity to build applications on a large cluster basis, Google has also designed and implemented a large scale data processing programming specification Map/reduce system. In this way, non-distributed professional program writers can write applications for large-scale clusters without worrying about the reliability and scalability of clusters. Application writers only need to focus on the application itself, while the handling of the cluster is referred to the platform for processing.
Map/reduce participates in operations with two simple concepts such as "map" and "reduce", in which users can perform large-scale distributed data processing on a cluster by providing their own map functions and the reduce function.
Google's text indexing method, the core part of the search engine, is said to have been rewritten by the map reduce method to obtain a clearer program architecture. Within Google, thousands of map reduce applications are running every day.
Distributed large-scale database management system BigTable
The third cloud-computing platform built on these two foundations is Google's bigtable system for extending database systems to distributed platforms. Many applications have very regular data organization. Generally speaking, the database is very convenient for processing the formatted data, but because of the strong consistency requirement of the relational database, it is difficult to extend it to a large scale. To deal with a large number of formatting and semi-structured data within Google, Google built a large database system bigtable with weak consistency requirements. Many Google apps are said to be based on bigtable, such as search History, Maps, Orkut, and RSS readers.
Figure 2 below shows the data model in the BigTable model. The data model includes a row and a corresponding timestamp, and all data is stored in the cell in the table. The contents of the bigtable are grouped by rows, and multiple rows are grouped into a small table and saved to a server node. This small table is called a tablet.
These are the three main parts of Google's internal cloud computing infrastructure, and in addition to the three components, Google has built a range of related cloud services platforms, including distributed scheduler, distributed lock services, and more.
Google Cloud Apps
In addition to the cloud computing infrastructure described above, Google has built a series of new Web apps on top of its cloud computing infrastructure. Thanks to the Web 2.0 technology of asynchronous network data transmission, these applications give users a new interface and a more powerful multi-user interaction. A typical Google Cloud application is Google's Docs Network Service program that competes with Microsoft Office software. Google Docs is a web-based tool that has a similar editing interface to Microsoft Office, a simple and easy-to-use document Rights Management, and records all user modifications to the document. These features of Google docs make it ideal for online sharing and collaborative editing of documents. Google docs can even be used to monitor project progress with clear accountability and targeted objectives. Currently, Google Docs has introduced editing modules for multiple functions, such as document editing, spreadsheets, PowerPoint presentations, and schedule management, to replace some of the corresponding features of Microsoft Office. It is noteworthy that the application of this cloud computing approach is ideal for sharing and editing with multiple users, making it very convenient for a team of people to work together.
Google Docs is an important application of cloud computing, that is, access to remote large-scale storage and computing services in a browser. Cloud computing can lay a good foundation for a large-scale new generation of network applications.
While Google can be said to be the biggest practitioner of cloud computing, Google's cloud computing platform is a private environment, especially as Google's cloud computing infrastructure is not yet open. In addition to opening limited application interfaces, such as GWT (Google Web Toolkit) and the Google Map API, Google does not share the cloud's internal infrastructure with external users, all of which are privately owned.
Fortunately, Google has disclosed a subset of its internal cluster computing environments, enabling global technology developers to build open-source, large-scale data-processing cloud infrastructure based on this part of the document, the most famous of which is the Apache Hadoop project. The following two cloud computing implementations provide a cloud-computing platform for external developers and small and medium-sized companies, enabling developers to build their own new network applications on the cloud's infrastructure. IBM's Blue Cloud computing platform is a platform for sales, users can build their own cloud computing platform based on these hardware and software products. Amazon's flexible Computing cloud is a hosted cloud-computing platform that users can use directly from the remote interface.
Example 2: Amazon's elastic Computing cloud
Amazon is the largest online retailer on the internet, but it also provides a platform for independent developers and developers to provide cloud computing services. Amazon is calling their cloud computing platform the flexible Computing cloud (elastic Compute cloud,ec2), the first company to offer remote cloud computing platforms.
Open service
Unlike Google's cloud computing services, Google offers cloud computing platforms only for its applications on the Internet, and independent developers or developers cannot work on the platform, so they can only develop cloud applications with Open-source Hadoop software support. Amazon's flexible Computing cloud service is also unlike IBM's cloud computing services platform, which does not sell a physical cloud computing platform and does not have a computing platform like the blue Cloud. Amazon has built its own flexible computing cloud on a platform of large-scale cluster computing within the company, and users can use the flexible computing Cloud's web interface to manipulate the various instances running on cloud computing Platforms (Instance), and the billing method is determined by the user's usage status, That is, users only need to pay for the computing platform they use, after the end of the run billing.
The elastic computing cloud is not the first of its kind in terms of evolution, and it has evolved from an existing platform called Amazon Web Services. As early as March 2006, Amazon released a simple storage service (simply Storage SERVICE,S3), which pays for the service in the form of similar rents per month, and users also need to pay for the corresponding network traffic. The Amazon Network Service platform uses standard interfaces, such as rest (representational state transmits) and simple Object Access Protocol (SOAP), through which users can access the appropriate storage services.
In July 2007, Amazon introduced a simple queuing service (SERVICE,SQS), a service that enables managed hosts to store messages sent between computers. With this service, application writers can transfer data between distributed programs without considering the problem of message loss. This service means that it does not matter if the recipient of the message has not started the module. The corresponding message is cached within the service, and the queue service submits the message to the appropriate runtime module for processing, once a message receiving component is started to run. Similarly, the user must pay for this messaging service, which is similar to the storage billing rule, charging based on the number of messages and the size of the message delivered.
When Amazon provides the above services, it does not develop the corresponding network service components from scratch, but optimizes and streamlines the existing platforms of the company, while satisfying the needs of its own network retail and shopping application, on the other hand, it is also used by external developers.
Following the opening of the service interface, Amazon further developed the EC2 system on this basis and opened it to external developers.
Flexible working mode
Amazon's cloud computing model follows a simple and easy to use tradition and is built on Amazon's existing cloud-computing infrastructure. Flexible computing cloud users use the SOAP over HTTPS protocol to interact with an instance of the Amazon's elastic computing cloud. The HTTPS protocol is used to ensure the security of the remote connection and to avoid leakage of user data during transmission. Therefore, from the usage mode, the flexible computing cloud platform provides a virtual cluster environment for users or developers, which makes the user's application flexible and eases the management burden of the cloud computing platform owner (Amazon).
The examples in the elastic computing cloud are some virtual machine servers that are actually running, each representing a running virtual machine. For a virtual machine that is provided to a user, the user has full access rights, including administrator user rights for this virtual machine. Virtual server charges are based on the virtual machine's ability to calculate, so, in fact, the user rented is the virtual computing power, simplifying the billing method. In the flexible computing cloud, there are three virtual machine instances with different capabilities, with different pricing. For example, the default is also the smallest running instance is 1.7GB of memory, 1 EC2 computing units (1 virtual computing core to the relevant computing unit), 160GB virtual machine internal storage capacity, is a 32-bit computing platform, charging standard for 10 cents per hour. In the current Blue computing platform, there are two more powerful virtual machine instances available, and of course the price is more expensive.
As users deploy network programs, more than one running instance is typically used, and many instances are required to work together. The internal network between instances is also set up within the elastic computing cloud, allowing the user's application to communicate between different instances. In the elastic computation cloud each computation instance has an internal IP address, the user program may use the internal IP address to carry on the data communication, obtains the best performance of the data communication. Each instance also has an external address, and the user can assign its own flexible IP address to his or her own running instance, so that the service system built on the flexible computing cloud can provide services to the outside world. Of course, Amazon is also on the network of service flowmeter fees, billing rules are also based on internal transmission and external transmission to separate.
All in all, Amazon reduces the maintenance of small software developers to cluster systems by providing a flexible computing cloud, and charges are relatively straightforward, and the amount of resources that users use is only paid for this part of the resource. This method of payment is different from the traditional hosting mode. The traditional hosting mode lets the user put the host into the hosting company, users generally need to pay according to the maximum or planned capacity, rather than pay according to the use, and may also need to ensure the reliability of the service, availability and so on, pay more, and many times, the service does not carry out the full use of resources. According to Amazon's model, users only need to pay for actual usage.
In user usage mode, Amazon's flexible computing cloud requires the user to create a server image based on Amazon specs (Amazon Machine image, AMI). The goal of the flexible computing cloud is that the server image can have any kind of operating system, application, configuration, login, and security that the user wants, but in the current case it only supports the Linux kernel. By creating your own ami, or by using Amazon's pre-provided Ami, users upload the AMI to the Flex Cloud platform after completing this step, and then invoke Amazon's application programming Interface (API) to use and manage the AMI. An AMI is actually an image of a virtual machine that users can use to do any work, such as running a database server, building a platform for fast network downloads, providing an external search service, and even renting out a feature-specific AMI to gain. Multiple AMI owned by a user can communicate with each other, just like the current cluster computing service platform.
In the future development of the flexible computing cloud, Amazon has also planned how to help users develop Web 2.0 applications on top of the cloud computing platform. Amazon believes that, in addition to its reliance on the online retail business, cloud computing is the core value of Amazon. It can be foreseen that in the future development process, Amazon will be in the flexible computing cloud platform to add more network Services Module modules, for users to build cloud computing applications to provide convenience.
Example 3:IBM Blue Cloud computing platform
IBM launched the Blue Cloud computing platform on November 15, 2007, offering customers the cloud computing platform to buy. It includes a range of cloud computing products that allow computing to run in a network-like environment by architecting a distributed, globally accessible resource structure that is not limited to local machines or remote server farms (i.e., server clusters).
Through the IBM technical White Paper, we can glimpse the inner structure of the blue cloud computing platform. The blue Cloud is built on the expertise of IBM's large-scale computing industry, based on open standards and open source software supported by IBM software, System technology and services. Simply put, the "Blue Cloud" is based on the cloud infrastructure of the IBM Almaden Research Center (Almaden), including Xen and POWERVM virtualization, Linux operating system images, and Hadoop file systems and parallel builds. The blue cloud is supported by IBM Tivoli software to ensure optimal performance based on requirements by managing servers. This includes providing a seamless experience for customers through software that can allocate resources in real time across multiple servers, accelerating performance and ensuring stability in the most demanding environments. IBM's newly released Blue Cloud program can help users build their cloud computing environments. It integrates Tivoli, DB2, WebSphere, and hardware products (currently x86 blades) to build a distributed, globally accessible resource structure for the enterprise. According to IBM's plan, the first "blue cloud" product to support the power and x86 processor Blade Server system will be launched in 2008, with a cloud environment based on system Z "mainframe" and a cloud environment based on high-density rack clusters.
In the IBM Cloud Computing White paper, we can see the following Blue Cloud computing platform configuration.
Figure 4 below illustrates the high-level architecture of Blue cloud computing. As you can see, the Blue Cloud computing platform consists of a data center: IBM Tivoli Deployment Management software (Tivoli Provisioning Manager), IBM Tivoli Monitoring Software (IBM Tivoli monitoring), IBM The WebSphere Application server, the IBM DB2 database, and some virtualized components. The architecture in the diagram mainly describes the background architecture of cloud computing and does not involve the user interface of the foreground.
There is nothing special about the hardware platform of the blue Cloud, but the software platform used by the blue cloud is different from the previous distributed platform, which is mainly embodied in the use of virtual machines and the deployment of Apache Hadoop for large-scale data processing software. Hadoop is a web developer based on Google's publicly available data, developed by the Hadoop file system similar to Google File system and the corresponding Map/reduce programming specification. We are also developing a chubby system similar to Google and the corresponding distributed database management system BigTable. Because Hadoop is open source, it can be modified directly by the user unit to suit the specific needs of the application. IBM's Blue Cloud offerings directly integrate Hadoop software into its own cloud computing platform.
Virtualization in the Blue Cloud
From the structure of the blue cloud we can also see that the software stack running on each node is a big difference from the traditional software stack because of the use of virtualization technology inside the blue cloud. The virtualization approach can be achieved at two levels in the cloud. One level is virtualization at the hardware level. Hardware-level virtualization can use IBM P-series servers to obtain hardware-logical partition LPARs. The CPU resources for logical partitions can be managed through IBM Enterprise workload Manager. This approach, coupled with resource allocation strategies in the actual use process, enables the appropriate allocation of resources to each logical partition. The logical partitioning of P series systems is 1/10 central processing units (CPUs).
Another level of virtualization is available through software, and Xen virtualization software is used in the Blue cloud computing platform. Xen is also an open-source virtualization software that can run another operating system on the basis of existing Linux, and flexibly deploy and operate the software through virtual machines.
The management of cloud computing resources through virtual machines has special benefits. Because the virtual machine is a kind of special software, can completely simulate the execution of the hardware, so can run the operating system above, and thus can retain a set of operating environment semantics. This allows the entire execution environment to be transported to other physical nodes in a packaged manner, thus isolating the execution environment from the physical environment and facilitating the deployment of the entire application module. In general, some good features can be obtained by applying virtualization technology to the cloud computing platform.
1. The management platform for cloud computing can dynamically position the computing platform on the required physical platform without stopping applications running on the virtual machine platform, which is more flexible than the process migration approach prior to the adoption of virtualization technology.
2. More efficient use of host resources, the multiple load is not very heavy virtual machine computing nodes merged into the same physical node, so that the idle physical nodes can be closed to save energy.
3. With the dynamic migration of virtual machines on different physical nodes, the load balancing performance independent of the application can be obtained. Because the virtual machine contains the entire virtualized operating system and the application environment, the migration takes place with the entire operating environment, with no application-independent purpose.
4. The deployment is also more flexible, that is, the virtual machine can be directly deployed to the physical computing platform.
In a nutshell, cloud computing platforms can achieve extremely flexible features through virtualization, and there are many limitations if you don't use virtualization.
Storage structure in blue cloud
The storage architecture in the Blue Cloud computing platform is also important for cloud computing, with both the operating system, the service program, and the user application data stored in the storage system. Cloud computing does not exclude any useful storage architecture, but it needs to be combined with application requirements for the best performance improvements. On the whole, cloud computing's storage architecture contains two different ways of clustering file systems like the Google File system and storage area network San based on block devices.
When designing the storage architecture of the cloud computing platform, it is not just about storage capacity. In fact, with the expansion of hard disk capacity and the falling price of hard disk, using the current disk technology, it is easy to get large disk capacity by using multiple disks. Compared to the capacity of the disk, the reading and writing speed of disk data is a more important problem in the storage of cloud computing platform. The speed of a single disk is very likely to limit the application's access to data, so in practice, you need to spread the data over multiple disks and read and write to multiple disks to achieve faster speeds. In the cloud computing platform, how data is placed is a very important issue, in the actual use of the process, you need to allocate data to multiple nodes of multiple disks. There are currently two ways to achieve this trend in storage technology, one is to use a clustered file system similar to Google file systems, the other is a block based storage Area Network San System.
Google file system We have done a certain description before. The IBM Blue Cloud computing platform uses its Open-source implementation of Hadoop HDFS (Hadoop distributed File System). This usage attaches the disk to the interior of the node and provides an external shared distributed file system space and redundancy at the file system level to improve reliability. In the appropriate distributed data processing mode, this method can improve the overall data processing efficiency. This architecture of the Google File system differs greatly from the San system.
San systems are also a choice of storage architecture for cloud computing platforms, and are also reflected in the blue Cloud Platform, where IBM also provides a platform for San to connect to the blue Cloud computing platform. Figure 5 is a schematic diagram of a SAN system.
As you can see in Figure 5, a SAN system is a storage-side network that builds a store of multiple storage devices into a single storage area network. The front-end hosts can access the back-end storage devices over the network. Also, because of the way a block device is accessed, it has nothing to do with the front-end operating system. There are several options available for San connectivity. One option is to use a fiber optic network that can operate fast fibre disks and is suitable for places with high performance and reliability requirements. Another option is to use Ethernet, iSCSI protocol, to run in a common LAN environment, thereby reducing costs. Because disk devices in a storage area network are not bound to a single host, but rather have a very flexible structure, it is possible for a host to access multiple disk devices to gain performance gains. In a storage area network, the virtualization engine is used to map logical devices to physical devices, and to manage the reading and writing of front-end hosts to back-end data. Therefore, the virtualization engine is a very important management module in the storage area network.
SAN systems and Distributed file systems, such as the Google file system, are not competing systems, but are two options to choose from when building a cluster system. Where San systems are selected, for applications to read and write, there is also a need to provide an upper-level semantic interface to the application, where the file system needs to be built on top of the SAN. And Google file system is just a distributed filesystem, so it can be built on a SAN system. Overall, sans and distributed file systems can provide similar functionality, such as handling errors. How to use it or how it needs to be decided by an application built on the cloud platform.
Unlike Google, IBM does not provide an externally accessible network application based on cloud computing. This is mainly because IBM is not a network company, but an IT service company. Of course, IBM's internal and future software services for its customers will be based on the architecture of cloud computing. (Note: This article is supported by the National 973 Program, Grant No.: 2007cb310900 and National Natural Science Foundation, Grant No. 90718040)