1.1 eucalyptus: open-source implementation of EC2 Eucalyptus [22] is implemented by Daniel Nurmi of the University of California and is an open source software infrastructure for cloud computing. Eucalyptus is an open-source implementation of Amazon EC2. It is compatible with EC2 commercial service interfaces. Eucalyptus is a software framework for research communities. Unlike other IAAs cloud computing systems, eucalyptus can be deployed on existing common resources. Eucalyptus adopts a modular design, its components can be replaced and upgraded, providing researchers with a good platform for cloud computing research. Eucalyptus is designed to be easy to scale, install, and maintain. Like EC2, Eucalyptus Operating system virtualization depends on Linux and xen. This section describes the design, architecture, and component functions of eucalyptus. The Eucalyptus system has been provided for download and can be installed and used in clusters and various personal computing environments. It is believed that eucalyptus will attract more people's attention as the study goes deeper. 1. Development Purpose You can useMultiple methodsTo use a variety of computing and storage resources, including a single laptop to thousands of computing nodes distributed around the world. Users generally locate these resources based on features such as hardware architecture, memory and disk storage capabilities, network connections, or geographic locations. Generally, such resource positioning involves complex issues such as resource availability, application performance analysis, software service requirements, and Administrative relationships. High-performance computing and grid computing have taken a huge step in resource configuration standards [23] [25]. However, these standards are still cumbersome for users who have complex resource requirements. For example, a user who needs a large amount of computing resources must contact multiple different resource providers to meet their own needs. The resources in these resource pools are mostly heterogeneous, this makes it very difficult for task performance analysis and effective resource utilization. Although users with professional knowledge can accept the heterogeneity of resources, many users prefer a development and running environment with the same hardware resources, software stacks, and programming environments, this consistency makes it easier to develop and deploy large-scale applications. The basic function of cloud computing is to provide access to large-scale data and computing resources through various interfaces. Currently, cloud computing systems basically follow the same principle, that is, the acquisition and release of resources are on demand, and user interfaces should be very simple. In addition, the resources provided by cloud computing systems use virtualization technology to hide a large amount of information such as the physical location of resources and details about the computing resource architecture. This application model has been widely recognized and provides developers with a new programming goal for developing scalable applications. As the number and scale of cloud computing systems grow, it is necessary to study some important cloud computing problems so that the cloud computing platform can develop towards the expected goal and achieve success. However, at present, most cloud computing products are either proprietary, or the large-scale infrastructure and software they rely on are not open to researchers in the research community, cloud computing researchers cannot arbitrarily modify and conduct experimental research on it [22]. The lack of research tools makes many basic problems unsolvable. EucalyptusIt is designed to support cloud computing research and infrastructure development. It is based onInfrastructure as a service (IAAS )"Unlike cloud computing providers such as Google, Amazon, Salesforce, and 3rd, the computing and storage infrastructure it uses, such as clusters and workstations, can be used by academic research organizations, it provides them with a modular open research and testing platform that provides users with the ability to run and control the entire virtual machine instance deployed on a variety of virtual physical resources. Eucalyptus DesignModularTo allow researchers to test the security, scalability, resource scheduling, and interface Implementation of cloud computing, which is conducive to the research and exploration of cloud computing by the majority of research communities. 2. Design Principles Although cloud computing systems already provide users with some available services, the software sealing makes it difficult for cloud computing enthusiasts to find a publicFlexible FrameworkTo customize your own experiments. Eucalyptus is a research-oriented open-source cloud computing system. To meet the needs of many researchers, it adopts a unique design [22] [30]: (1) eucalyptus must be able to be deployed and executed in a hardware and software environment not controlled by its designers. (2) eucalyptus must be modular, so that different researchers can upgrade, transform, and replace it, while maximizing scalability. Eucalyptus's system architecture design considers both of the above principles and balances them. 3. Eucalyptus and IAAs Although most existing cloud computing implementations follow the principles of flexibility, scalability, and dynamic computing capabilities, there are many differences in how to provide such computing capabilities to users. For exampleAmazon's Elastic Computing cloud EC2 [23] [27]Allows users to allocate the entire virtual machine as needed to provide infrastructure-as-a-service (IAAS) services. It allows users to provide their own operating system kernel, basic operating system software, any user-level software and the applications they want to run. The IAAs system is only responsible for providing physical resources and instantiating users' virtual machines. Eucalyptus implementationIAAsSpecifically, it is designed to facilitate installation and maintenance in the research environment for modification, experimentation, and expansion. Commercial cloud computing infrastructure uses these capabilities to control the configuration of local resources (such as hardware versions, operating system versions, networking and storage policies) and access potentially expensive resource sets. In the research environment, cloud computing infrastructure cannot impose a specified configuration on all its hardware and software, nor can it assert that there are a large number of available resource sets to ensure system performance. A typical IAAs system has specific infrastructure. It does not take scalability and portability as the design goal, nor does it take convenience management as the primary goal of the design. Aggregating multiple computing resources to a single resource pool makes the design of cloud computing systems more difficult, few open-source software packages can be installed and deployed on multiple computing clusters to collaborate and cooperate with each other to execute tasks. Therefore, eucalyptus is a unique example of IAAs and a pioneer in the future multi-cluster open-source design. IAAs is not the only way for commercial departments to implement cloud computing [22], as shown in figureAmazon and Google also use S3 (Simple Storage Service) and App Engine to implement daas (data as a service, data as a service) capabilities, usingUsers can access and store a large amount of data from the provided computing resources. Similarly, Google's App Engine implements language-layer abstraction to provide PAAs (Platform As a service (platform as a service ). In addition,Companies like Salesforce also provide SaaS (software as a service, software as a service). Eucalyptus chose IAAs as the design idea for two reasons. First,Amazon EC2 may be the most commercially successful cloud computing system.EC2 implements IAAs. Eucalyptus uses interfaces compatible with EC2, which makes it possible to test the performance of a relatively mature commercial cloud computing system. Using IAAs allows eucalyptus to refer to EC2 during design and testing. Second, abstract high-level cloud computing relies on similar IAAs functions at least in terms of concept. To further study and deploy open source software, cloud computing systems, including IAAs, will be necessary and beneficial. 1.1.2 Architecture Eucalyptus has two engineering objectives:Scalability and non-invasive. Eucalyptus has a simple organizational structure and modular design, so it is easy to expand. Eucalyptus uses open-source Web Service technology, and its internal structure is clear at a glance. Each component of Eucalyptus consists of several web services. It has a well-defined interface described in the WSDL document and supports secure communication by using the WS-Security Policy. Eucalyptus relies on industry-standard software packages such as axis2, Apache, and Rampart. These implementation technologies also support the design's second goal: non-intrusive or overwrite deployment. Eucalyptus does not require the user to use all his machines for eucalyptus, nor to modify local software configurations in a potentially destructive manner. It only requires eucalyptus nodes to support virtualization execution and deployment of web services through xen. Eucalyptus can be installed and executed without modifying the basic infrastructure. Academic research organizations can access a variety of resources such as small clusters, workstation pools, and various servers and desktops. Due to the lack of IP addresses and security concerns caused by full access to resources over the Internet,System AdministratorUsuallyDeploy the cluster on a private, non-routable network with only oneHeader NodeRoute traffic between the computing pool and the Internet.Although this configuration provides security protection by using the least public IP addresses that can be routed, however, this means that most machines can be connected to external hosts, but external machines cannot communicate directly with machines in the cluster. For example, there are two small Linux Clusters, a small server pool and a workstation set. Each cluster has a front-end machine with a public IP address, nodes are connected through a dedicated network. Servers and workstations have public IP addresses, but these workstations are behind the firewall and cannot be connected from outside. In this case, it is obvious that it is impossible to install a completely interconnected system, because many machines can only initiate connections to external hosts, or are completely isolated from external networks. In addition, the nodes in the two clusters may have overlapping IP addresses because they are located in different private networks. To use all these resources in a single cloud computing system, eucalyptus adopts a hierarchical architecture, 6? [22], as shown in Figure 13. Where, CLCCloud Controller ),CCCluster controller and NC Node Controller ).
Figure 6? 13 layered topology of Eucalyptus These layered components can be easily installed on common network layered structures. A typical example of Eucalyptus deployment 6? 14, as shown in [22]:
Figure 6? 14 typical eucalyptus deployment example 1.1.3 Main Components The main components of Eucalyptus include Node Controller, cluster controller, and Cloud Controller. 1. Node Controller The Node Controller manages a physical node. The node controller is a component running on physical resources hosted by virtual machines. It is responsible for starting, checking, closing, and clearing Virtual Machine instances. A typical eucalyptus is installed with multiple node controllers, but only one node controller can be run on one machine, because one node controller can manage multiple virtual machine instances running on the node. The Node Controller Interface is described in the WSDL document. This document defines the instance data structure and instance control operations supported by the node controller. These operations include runinstance, describeinstance, terminateinatance, describeresource, and startnetwork. Execute the minimum system configuration for running, describing, and terminating an instance, and call the current hypervisor to control and monitor the running instance. The describerescource operation returns the features of the current physical resource to the caller, including processor resources, memory and disk capacity. The startnetwork operation is used to set and configure virtual ethernet. The relevant content is discussed below. 2. cluster controller A typical cluster controller runs on the cluster's header node or server. They can all access private or public networks. A cluster controller can manage multiple node controllers. The cluster controller is responsible for collecting node status information from the Node Controller to which the node belongs. Based on the Resource status information of these nodes, the VM instance is scheduled to execute requests to each node controller, manages the configurations of Public and Private instance networks. Like the node controller, the cluster controller interface is described in the WSDL document. These operations include runinstances, describeinstances, terminateinatances, and describeresources. The operations to describe and terminate the instance are directly transmitted to the relevant Node Controller. When the cluster controller receives a runinstances request, it executes a simple scheduling task, which calls describeresource to query each node controller, select the first Node Controller with sufficient idle resources to execute the instance running request. The cluster controller also implements the describeresources operation, which takes the resources occupied by an instance as the input and returns the number of instances that can be executed on the Node Controller to which the instance belongs. 3. Cloud Controller Each eucalyptus installation includes a single Cloud Controller. The Cloud Controller is equivalent to the central nervous system. It is the visible entry point of the user and a component that makes global decisions. It is responsible for processing incoming user-initiated requests or management requests sent by the system administrator, and making high-level VM instance scheduling decisions. It also processes service level agreements and maintains system and user-related metadata. A cloud controller is composed of a group of services used to process user requests, verify and maintain systems, user metadata (Virtual Machine images and SSH key equivalence ), it can also manage and monitor the running of Virtual Machine instances. These services are configured and managed by the enterprise service bus, and can be released through the Enterprise Service Bus. Eucalyptus is designed to emphasize transparency and simplicity to facilitate eucalyptus experiments and extensions. To achieve this level of granularity expansion, Cloud Controller components include virtual machine schedulers, SLA engines, user interfaces, and management interfaces. They are modular and independent components that provide externally well-defined interfaces. The Enterprise Service Bus ESB is responsible for controlling and managing the interaction and organic cooperation between them. By using Web Services and Amazon EC2 Query Interfaces to interoperate with EC2 client tools, cloud controllers can work like Amazon EC2. EC2 is selected because it is relatively mature, has a large number of user groups and achieves IAAs well. 1.1.4 access interface The Web Service Interface package in the Cloud Controller has three important interfaces: client interface, management interface, and instance control interface. 1. Client Interface The client interface of the Cloud Controller is essentially a converter between the eucalyptus internal system interface and the externally defined client interface. For example, Amazon provides a WSDL document describing the web service and soap-based client interfaces of its services and a document used to describe interfaces based on HTTP queries, they can be converted to internal objects of Eucalyptus through the user interface service of the Cloud Controller. We use the jibx [26] binding tool to specify the ing between XML elements and Java object instances. Likewise, we can use it to create mappings between EC2 soap messages and eucalyptus internal objects. However, the query interface is not suitable for this model, mainly due to the following reasons: (1) No XML document is available. (2) authentication mechanisms are different and conflict with the adopted WS-Security security policy. (3) there is a conflict between the SOAP request and the query request structure in the same domain of the same request. However, since EC2's query interface is a strict subset of the soap interface, eucalyptus developed a simple binding framework to map HTTP parameter names to object fields under the guidance of relevant annotations, then, the annotation of the target object is used to clarify and modify the inconsistency of the ing. Finally, jibx uses the namespace as the boundary object group of EC2 soap interface. The results include: (1) jibx verifies the object. It is actually a legal soap Interface request and a legal EC2 client request. (2) XML documents after grouping can be used as part of soap for further processing. 2. Management Interface In addition to supporting major tasks such as starting and stopping an instance, cloud computing infrastructure should also support basic management tasks, such as adding and removing users and managing Disk Images. Eucalyptus manages operations through a web-based interface. It is implemented by the Cloud Controller through a web-based interface or through command line. The management interface can only be seen by system administrators and is unique. To add a user to eucalyptus, You can manually add the user as an administrator or apply for verification online. Online verification usually requires you to provide your own email address and click the verification link sent from the system to obtain the verification, because eucalyptus binds your identity with the email address. After a user is successfully added, the administrator can temporarily or permanently remove the user, and the administrator can manage and terminate the running instances of the user. You can also add a disk image to the system through the Management Portal. An image includes a client operating system kernel compatible with xen, a root file system image, and an optional Ram hard disk image. The process of adding an image includes loading these three image files and naming them. The administrator can temporarily or permanently delete an image. In addition, the administrator can also add or delete nodes in the cluster through the Controller configuration file. 3. instance Control Interface Cloud Controller provides a virtual machine Control Service (vmcontrol Service) to manage the creation of Virtual Machine instance metadata. The Virtual Machine Controller continuously maintains a simple local description of the basic resource status, such as the number of potential instances that a cluster controller can create. When an instance creation request is initiated, the VM controller will collaborate with other services of the Cloud Controller to split users' requests into images, key pairs, networks, and security groups, A solution is generated in advance based on the corresponding metadata and resource application configuration policies, and then messages are distributed to the cluster controller involved. The cluster controller schedules these requests to the node controllers under its jurisdiction, finally, the node controller creates virtual machine instances to run user jobs and applications. 1.1.5 Service Level Agreement Service level agreement (SLA)Is implemented as an extension of the Message Processing Service. The message processing service can check, modify, and discard the message and the State saved by the VM controller (vmcontrol. The Virtual Machine Controller determines the resource to be accessed and executes the system level or user-specified service level agreement. Virtual Machine controllers rely on a local state model to make these decisions. The model obtains the availability, configuration, virtual network, and registration image status information of their instances through the cluster controller, virtual Machine controllers rely on this information and its update events to make global service decisions. Eucalyptus implements an extensible SLA model, coupled with the state model and event processing to support further quantitative research on SLA. Virtual Machine controllers rely on local models to make decisions. To keep the model updated, each Cloud Controller passively polls its instance status, including instance availability, allocation, virtual network, and registered image. The information obtained through polling is used as the benchmark for judgment. User requests are submitted to the transaction. When the resource meets the transaction, the transaction is committed for processing. However, when information is lost or the resource status changes due to network problems, the model may become inconsistent, resulting in a service level agreement between users that cannot meet the requirements of the system. However, because polling is semi-synchronous, the loss of information can be identified, and the model can be detected at an invalid time point. In the end, the model can be calculated at which time point of the given time period is invalid, so as to avoid problems caused by inconsistent models. Eucalyptus has implemented a simple but powerful preliminary SLA, which allows users to control the High-level network topology of their instances. Eucalyptus uses the concept "zone" proposed by Amazon EC2 to refer to "pools" or "clusters" composed of computing and storage resources ). A region is a set of machines logically composed of multiple node controllers and a single cluster controller. Eucalyptus allows you to specify a region configuration for the execution of an instance. This configuration provides different management and network performance parameters. Based on this configuration, an instance set can run in one cluster or across clusters, to obtain the expected performance. Eucalyptus extends the concept of a region to support different service level protocols, considering the number of resources you obtain and the relative dynamics of the topology. Currently, the regions provided by eucalyptus allow users to have multiple options when performing jobs, you can obtain a specified cluster based on the Service Level Agreement, select an idle cluster, and specify a single or multiple clusters to serve you. 1.1.6 virtual networking VM instance interconnection ProblemsIs one of the most important tasks in building cloud computing infrastructure. Unlike physical networks composed of physical machines with strict and complex topology logic, virtual machine instances constitute a virtual network, which is simple and easy to configure through virtualization. Virtual Machine instances in a region must have network connections with each other, and at least one of them is connected to the external public network, this allows the owner to provide an access portal and interact with other domain instances. Because users have superuser permissions for virtual machines they monitor, they can access basic network interfaces. Therefore, they have the ability to obtain system IP addresses and MAC addresses and cause interference to the system network. In addition, if two instances run on the same physical machine, the virtual machine user can affect and snoop into the Network Package of the other virtual machine, which will cause security problems. Therefore, on cloud computing platforms shared by different users, virtual machines that collaborate to complete a single task should be able to communicate, while virtual machines belonging to different users should be isolated from each other. To solve the preceding problem, each virtual machine provides two virtual network interfaces, one as a public interface and the other as a private interface. Public interfaces are used to communicate with external virtual machines under the user's jurisdiction, or between instances in the available regions defined by the Service Level Agreement. In an environment with available IP addresses, these addresses can be assigned to virtual machine instances at startup to allow them to communicate. In a private network that supports external communication routers, its public network interface can assign a valid Private address to access the external network through a vro with network address translation. The private interface of an instance can only communicate with virtual machines across regions, to solve the problem that different virtual machine instances run in an independent private network but need to communicate. The cluster controller is responsible for creating and destroying Virtual Network Interfaces of instances. The Node Controller is configured to create public interface networks in the following three ways: (1) make the public interface of the virtual machine directly connect to the Ethernet Bridge Software connected to the physical machine network. The administrator can process the DHCP request of the Virtual Machine network as usual. (2) The Administrator is allowed to define a dynamic IP address pool and a network. The cluster Controller connects to the network through an interface, the DHCP server running on the cluster controller is responsible for dynamically allocating addresses to these instances when the instance is started. (3) define a static MAC/IP address pair. When an instance is started, the system assigns it an idle MAC/IP address pair. After the instance is terminated, the MAC/IP address pair is released. The private interface of the instance is connected to an all-virtual ethernet system called virtual distrubuted Ethernet (VDE) [25] through a bridge. VDE is a line-level implementation of the Ethernet protocol. The VDE network is connected to the real Ethernet through the common tun/TAP interface. tun/TAP provides Ethernet packet communication services from the Linux kernel to the user space. A vde [25] network consists of a VDE switch and a cable connection between them. The VDE switch is located on the Node Controller and cluster controller, and the VDE switch uses the Spanning Tree Protocol (Spanning Tree Protocol ), it prevents loops from allowing redundancy in the network at the same time. If there is no firewall, the VDE network is completely connected, that is, the VDE switch is directly connected to each other VDE switch. If there is a firewall, the VDE must be connected to at least one VDE switch in the system. To ensure system security, the network traffic of the instance must be isolated. These instances can run on the same host or on different machines in the same physical network. The system requires that no two instances can check and modify each other's network communication. The specific method is to use a virtual LAN (VLAN) to tag the instance set of a specific user, to isolate and forward different network traffic. |