Following the "0 start deployment of large data virtualization" series of tutorials, the spirit of "know it, but also know why" principle, this series into the large data virtualization inside, divided into two posts to help readers understand vsphere Big data Extensions (hereinafter referred to as BDE) the deployment architecture and system architecture, understand the deployment principle and internal composition, as well as their respective roles. Hope to help you, but also welcome your message evaluation.
On: Serengeti Virtualization applications
Under: Serengeti Management Server System architecture (that is, this article)
Serengeti the system architecture of the Management Server
The Serengeti Management Server includes several important modules: the Serengeti Web Service,ironfan Service,chef server,package repository and Vhadoop Runtime Manager. Here is a brief description of the functions of each module.
Serengeti the system architecture diagram of the Management Server
Serengeti Web Service
The Serengeti Web Service is a Serengeti workflow control center that provides external RESTAPI that external clients can access through the rest API to all Serengeti-provided functionality.
This is a web Service running on Tomcat, which uses a variety of features provided by Springframework, including workflow control with the spring batch, and spring security guarantees user safety, Spring MVC provides the rest API and the integrated hibernate for internal data management.
The WEB service integrates the registration and management of vcenter extension at the same time. After the Serengeti deployment is complete, register as an vcenter extension server. Responsible for VC communication, VC session management, Virtual machine creation, configuration and management.
In Web service, an important function is the virtual machine's distribution calculation (Machine placement).
Under a virtualized environment, the user requests are usually the number of node to create, the available resources, such as those hosts (host), those storage (datastore) can be used. Then under these requirements, we face a problem abstract, that is, on the N host, create M VM, each VM's hard disk, the network has certain requirements.
Under this proposition, Serengeti has developed some general distributed algorithms to meet the special requirements of Hadoop for resource usage. For example, typical applications are distributed evenly above the available host. This ensures that the nodes of Hadoop are not distributed on the same host, providing a basis for the reliability of the data.
To meet the separation of data and computing capabilities, Serengeti also provides a strategy for node association. Specifically, it is on each host, at the same time placing m nodes, where K is the node that holds the data, J is the node to compute. and satisfies the k+j=m condition.
The following article will detail the VM placement algorithm supported by Serengeti, and the appropriate application scenarios, which are not discussed in detail here.
Chef Service
Chef is a popular distributed software configuration management tool, which is widely used in data center management and Operation Dimension. For example, both Facebook and Amazon are chef users. Serengeti the use of chef scripts for the installation and management of Hadoop software after virtual machine creation is complete.
Serengeti contains a series of chef scripts that support the installation and configuration of multiple Hadoop distributions, including hadoop1.x,hadoop2.x, and HBase. These scripts are pre-installed on top of the chef server.
The Chef client component is pre-installed in a virtual machine created by Serengeti. When the virtual machine is created, Chef client will download and run the script from Chef server to truly complete the installation and configuration of Hadoop.