Original article: http://www.oschina.net/question/12_32393
Cloudfoundry Conference: http://www.infoq.com/cn/zones/vmware/
VMware suddenly released the industry's first open-source PAAs-cloudfoundry in April this year. In the past few months, I have been paying attention to its evolution and have benefited a lot from its architectural design. I think it is necessary to write it out and share it with you.
This article will be divided into two parts: the first part mainly introduces the architecture design of cloudfoundry, from the introduction of its modules, to the Message flow of each part, how to coordinate cooperation between modules; in the second part, we will use the architecture knowledge introduced in the first part based on how to deploy a private PAAs in your data center using cloudfoundry.
Many of the content in the first part will reference Pat's speech on the cloudfoundry architecture at vmwarecloud forum in October 12. Pat is the head of cloudfoundry core. His speech is worth hearing. If you are present and understand what he said, skip this section. Apart from talking about specific content, it is unlikely that I can speak better than him.
Architecture and module
In general, the architecture of cloudfoundry is as follows: This architecture diagram and the architecture diagrams of each module used below are all from Pat's PPT. Cloudfoundry consists of the following components:
1. Route distribution: as the name suggests, the router component routes all incoming requests in cloudfoundry. There are two main types of requests to enter the router: the first is from vmcclient or STS, issued by cloudfoundry users, managed commands.
For example, list the vmcapps of all your apps, and submit an app. These requests will be routed to the applife management component, also known as the cloudcontroller component. The second type is the external request for access to the apps you deploy. This part of requests will be routed to appexecution, or DEAS components. All requests entering the cloudfoundry system will go through the router component. If you see this, there may be friends who will worry that the router becomes a single point and thus become the bottleneck of the entire cloud.
However, as a cloud system, cloudfoundry is designed with single-point dependency, parallel component expansion, and substitution to ensure scalability. This is the design principle of cloudfoundry and even all cloud computing systems, later, we will discuss how cloudfoundry can achieve this. As long as you know, the system can deploy multiple routers to process the requests, but the loadbalance on the upper layer of the router is not in the scope of cloudfoundry implementation, cloudfoundry only ensures that all requests are stateless, so that the upper-layer balanced attach selection area is very large. For example, you can use DNS or deploy the hardware loadbalancer, or simply, it is feasible to set up an ngnix Server Load balancer.
The router component. The current version is a simple encapsulation of nginx (HTTP and reverse proxy servers ). A friend familiar with ngnix should know that it can use a socket file (. Sock file) as the input and output. An nginx is installed on all the router component servers that install cloudfoundry. The structure of the router component is as follows:
When the external httprequest enters the cloudfoundry server, nginx receives the request first. nginx interacts with router. RB through sock, so the router component is used to process the request. The router. RB performs logical judgment based on the input URL, user name, password, etc. The data is retrieved from the cloudcontroller component or the DEA component and returned through the. Sock file connected to the Niger.
Router. Rb is a logical encapsulation of nginx. Those familiar with cloudfoundry must know that cloudfoundry assigns a URL access to each app. If you directly use the cloudfoundry.com hosted by Vmware, your app URL may be xxx.cloudfoundry.com, no matter how many instances are extended to your app by using commands, they are all accessed from this URL. Here, the URL conversion route is forwarded by the router. RB implementation.
We can see that the core of the router is still the conversion and distribution routing. In addition to this, we can process authentication. Because the router is the only entry to all external requests, many things can be done in the router.
2. DEA (droplet Execution Agency): First, we need to parse what is droplet. In the concept of cloudfoundry, droplet refers to a running environment that provides the source code you submit and a complete set of cloudfoundry, and adds some management scripts, for example, the start/stop scripts are all compressed into a tar package. Another concept is stagingapp, which refers to the process of creating the package described above and storing it. Cloudfoundry automatically saves this droplet until you start an App. A server deployed with the DEA module will run the droplet copy. So if you scale your app to 10 instances, the droplet will be copied ten times to let
Run the 10 DEA servers.
Is the structural diagram of the DEA module:
The Cloud Controller Module (which will be introduced below) will send basic apps management requests such as start and stop to DEA, DEA. RB to receive these requests, and then find the appropriate droplet from NFS. As mentioned above, droplet is actually a tar package with a running script and a running environment. DEA only needs to extract the package and start the script in the row, you can run this app. At this point, the app can be accessed and started. In other words, a port with this server is on standby, as long as there is a request
From this port, the app can receive and return the correct information. It can be understood that DEA is a container that encapsulates deployed applications. This container can be dynamically created, managed, and run.
Next, the DEA. RB should do the following: 1. Tell the router module the information. As mentioned above, all requests entering cloudfoundry are processed and forwarded by the router module, including the user's request for accessing the app. After an app is started, the router must be notified, let it transfer the appropriate request according to loadbalance and other principles, so that the instance of this app can start to work; 2. Some statistical work, for example, the user needs to deploy another app to cloudcontroller for quota control. 3. Inform the healthmanager module of the running information and report the instance running status of the app in real time. In addition, DEA is also responsible for querying droplet. For example, if you want to query the log information of an app through cloudcontroller, DEA needs to get the log returned from the droplet.
3. cloudcontroller: cloudfoundry management module. Main tasks include:
- A) add, delete, modify, and read apps;
- B) start and stop the application;
- C) staging apps (package apps into a droplet );
- D) modify the application running environment, including instance and mem;
- E) manage services, including binding services to apps;
- F) cloud environment management;
- G) modify cloud user information;
- H) View cloud foundry and the log information of each app.
This seems a bit complicated, but simply put, it can be very simple: it is the server that interacts with VMC and STs. VMC and STS use restful interfaces to communicate with cloudfoundry. cloudcontroller is a typical rubyon rails project. It receives JSON-format protocols from VMC or STS and then writes them to the cloudcontroller database, send a message to each module and go to control and manage the entire cloud. Like other ror projects, all cloudcontroller APIs can be viewed in CONF/routes. RB. The benefits of open restful interfaces are third-party application development and integration. enterprises can use these interfaces to automatically control and manage the entire cloud environment when deploying a private cloud using cloudfoundry. This part will be discussed in the second part. Yes cloud
Controller Architecture diagram:
In the figure, Health Manager and DEA are external modules, while ccdatabase is cloudcontroller database. This is where the entire cloudfoundry cannot be used as HP. Cloudcontroller databases do not have a lot of concurrency. application-level database access is handled by the bottom service module, which stores cloud configuration information. The read operation mainly comes from the start of DEA and serves as the basis for initializing DEA. The healthmanager module will read the expected status information from here, which will be compared with the actual status information obtained from DEA.
NFS is the shared storage of multiple cloudcontrollers. An important task of cloudcontroller is stagingapps. The storage of droplets is unique in the cluster environment. Cloudcontroller runs in a cluster. In other words, each request may be processed by different cloudcontrollers. In a simple user scenario, we need to deploy an app to cloudfoundry. After typing the simple push command, VMC starts to work. After completing a round of user authentication, check whether the number of apps deployed has exceeded the predefined amount, after asking a bunch of questions about related apps, You need to issue four commands:
- 1. Send a post to "apps" and create an app;
- 2. Send a put to "apps/: Name/Application" and upload the app;
- 3. Send a get to "apps/: Name/" to get the app status and check whether the app has been started;
- 4. If it is not started, send a put to "apps/: Name/" to start it.
If steps 2nd and 4th are handled by different cloud controllers and cannot be ensured that they can find the same droplet, step 4th fails to be started because the corresponding droplet cannot be found. How can we ensure that all the commands point to the same droplet? Using NFS makes shared storage of cloudcontroller the easiest way. However, this method is not perfect in terms of security. On the vmwarecloud forum in October 12, Pat told us that the next version of cloudfoundry will be greatly adjusted here, but before that part of the code is made public, I cannot comment too much here.
Cloudcontroller can be understood as the core scheduling module. It obtains performance and health data from healthmanager, dynamically creates and stops instances according to the scheduling policy, including the mounting of instances to the router. The entire solution is similar to the original application virtualization method, and does not necessarily require the support of Server Load balancer hardware.
4. healthmanager: it is not complicated to do things. Simply put, it gets the running information from each DEA, and then conducts statistical analysis and reports. The statistical data is compared with the set metrics of cloudcontroller and alert is provided. The healthmanager module is not perfect yet, but in the cloudmanage stack, automatic health management and analysis are very important areas, and there are many areas that can be expanded, combined with orchestrationengine, you can use cloud self-management and self-warning. Combined with Bi technology, you can collect statistics on operations and rationally allocate resources. Cloudfoundry is still under development.
5. Services: The cloud foundry service module is an independent plug-in module from the perspective of source code control, so that third parties can integrate their services into the cloudfoundry ecosystem. On GitHub, we can see that the service is a repository independent from the cloudfoundry core project vcap, Which is vcap-service. In the service module, the design principle is to facilitate third-party service providers to provide services. Cloudfoundry has been very successful in this regard. from GitHub, the following services are available: a) MongoDB;
B) MySQL; c) neo4j; d) PostgreSQL; e) rabbitmq; f) redis; g) vblob. Base classes are all placed in the base folder.
If a third party needs to develop its own cloudfoundry service, it must inherit and rewrite the two basic classes in it: node and gateway. Some operations, such as provision, can be performed in the base's provisioner. add your own logic based on RB, including service_error and service_message. As for how to write your own service, the ELC blog will publish a detailed article, which is not covered in the scope of this article. To understand the architecture, as long as you know the relationship between services, we know that a service and a base can be expanded horizontally through the inheritance relationship, and cloudfoundry and apps call the service through the base to complete this simple architecture method.
6. NATs (Message Bus): According to the general architecture diagram of cloudfoundry, a component named Nats is located at the center of each module. Nats is a lightweight message system developed by Derek, cloudfoundry's architect, that supports publishing and subscription mechanisms. GitHub Open Source Address: https://github.com/derekcollison/nats. It is developed based on eventmachine and has a small amount of code. You can download it and study it slowly.
Cloudfoundry is a multi-module distributed system that supports module self-discovery, error self-check, and low coupling between modules. Its core principle is based on the message publishing and subscription mechanism. Each module on each server will publish multiple message topics to messagebus based on its own message categories. At the same time, it will also send messages to the modules that need to interact with each other, subscribe to a message based on the message topic of the required information. For example, if a DEA is added to a cloudfoundry cluster, it needs to be yelled at to indicate that it is ready for service. It will publish a message with the topic "Dea. Start:
We can see that the core of cloudfoundry is a set of message systems. If you want to know the ins and outs of cloudfoundry, it is a very good way to track the complicated message mechanism in it. On the other hand, cloudfoundry is a message-based Distributed System. The message-oriented architecture is the foundation of its cloud features such as horizontal node scaling and component self-discovery.
This is a brief introduction to the cloud foundry architecture. In fact, as the first open-source paas, The cloudfoundry architecture has a lot to learn from, and the processing in many details is very subtle, these contents may be further explored in subsequent articles. Although this article is intended to go deep into cloudfoundry, it is just a short taste. I will introduce the overall architecture, the goal is to enable us to have sufficient background knowledge to use cloudfoundry to build private PAAs within an enterprise. To sum up, I learned something from the structure of cloudfoundry:
1. Message-based multi-component architecture is a simple and effective method for implementing clusters. Messages can decouple cluster nodes and enable self-registration and self-discovery. These functions are very important in large-scale data centers;
2. Use the appropriate abstraction layer and template mode to facilitate third-party development of extended functions in cloudfoundry. Cloudfoundry implements abstraction layer processing on both the DEA and service layers, so that developers can easily develop runtime and service for cloudfoundry. For example, when cloudfoundry was launched, it only supported node. JS, Java, and Ruby. However, third-party providers and open-source communities quickly followed up and added PHP and Python support for cloudfoundry. This is due to cloudfoundry's exquisite DEA architecture design. How to develop new runtime support will be discussed in subsequent blog posts.