Today, a lot of companies are building their own private PAAs platforms, and of course there are a number of large Internet companies building a common PAAs platform (such as Sae/bae/jae (jae.jd.com)). So what are the advantages of using a PAAs platform to deploy SaaS applications? In addition to everyone knows convenient deployment management, save resources and costs, I mainly introduce to you today there is also a good thing is to deploy on the PAAs platform application very easy to do 7x24 hours without server execution (even if you need to deploy and update the application again), This is very difficult for the general business and ordinary developers. Of course, it's not that simple to do it in the PAAs platform. Requires a very strong technical force. Here's a look at how the PAAs platform implements 7X24-hour implementations of applications deployed on PAAs platforms.
The premise is that the PAAs platform itself is 7x24 hours high and reliable before introducing the design of the scheme.
The design of this program mainly involves the following improvements:
(1) Application execution Scheduling module: The ability to dispatch multiple instances of the application to different servers and racks for execution;
(2) Application Execution Status monitoring module: the corresponding implementation of the status of monitoring;
(3) Graceful Restart the application module: can be in the application of another deployment and upgrade, non-stop service;
First, let's take a look at the dispatch module
The dispatch module should be the standard of the PAAs platform (whether private or co-owned), only the different PAAs platforms have their own special scheduling methods and policies, such as scheduling according to the use of server resources (which involves a variety of resources of the scheduling. For example, depending on CPU or memory, etc.), or according to the number of deployed applications to dispatch.
Of course, a good scheduling strategy is definitely not just a criterion to use as a scheduling strategy, it is certainly a combination of various circumstances. As the main introduction today is the application of high reliability, so the main introduction to the scheduling algorithm to ensure the high reliability of the application.
If the application executes 3 instances in order to improve its service capability and reliability. So how is the scheduling algorithm optimal (only for high reliability)? We all know that designing a high-reliability system will take into account the problems with the server and the network (switch). So the dispatch module in order to ensure the high reliability of this application should be required to ensure that the three instance application is not executed on the same server, at the same time to ensure that three instances should not be executed under the same rack, Of course, assuming a conditional PAAs platform can consider scheduling across data centers (there should not be a few PAAs platforms that can deploy applications on PAAs platforms across data centers).
To achieve the above-mentioned scheduling results. The dispatch module should be able to know the deployment of the server where the application is deployed, or at least be able to query it in some way. I believe that all PAAs should have a resource management module (or a server monitoring module called the PAAs platform) to provide this information. In addition to knowing the deployment of the server, the dispatch module should also be able to know or query to an application instance on which server, because there is only such a dispatch module is sufficient to ensure that the subsequent instances are not dispatched to the same server. For example, the application launches three instances, and 2 instances have been started. You also need to start a third instance. Then the dispatch module needs to know the server and rack that the other instance executes before starting the third instance, so that the third instance is dispatched to other racks for execution. The same assumption that the application execution of three instances, suddenly one of the instances of the hanging off, then the third instance needs to be executed again, or need to use the scheduling module to complete the different server and different rack scheduling. As for the dispatch module how to know that an instance has been hung off. This is not what the dispatch module cares about. The application execution status monitoring module described below will solve the problem very well.
Second, the monitoring module of application execution state
Create 7x24-hour high-availability applications, assuming that even the status of application execution is unknown, and how to implement 7x24 hours. To put it simply, the monitoring module that applies the execution state is real-time monitoring of the execution status of each instance of the application (surviving or hanging out). It is then processed according to the monitoring results.
The result of the monitoring module is very simple: it is the execution state of the application and the execution state of the user's expectation is consistent.
For example, the user expects the application to be executed, but the execution state is hung up at this point. Certainly is unreasonable, also such as the user expects at this time does not want to provide the service, hoped that the application is to stop executes, but assumes the application is also unreasonable in the execution. Another example is that the user expects the application to execute three instances at a moment. However, only two instances are executed, and the user needs are not satisfied.
Monitoring module is to find that these and user expectations are inconsistent. and processing.
So how do you do it? The following is a combination of the open source PAAs platform Cloudfoundry solution to introduce the application implementation of the monitoring module design, Cloudfoundry so far has the original ruby version of single-process single-point monitoring module has been transformed into a go version of high-reliability multi-process monitoring module. For more information, please visit the official website. Today, the main combination of the latest version of the monitoring module to introduce how to design the monitoring module:
The latest version of Cloudfoundry's application execution Status monitoring module is also divided into very many sub-modules, each of which is capable of performing a master process and multiple standby processes for highly reliable deployments (these processes can be deployed on different servers and racks. Only the network can get through).
First, you need to get the user expectation status for each app. Then we need a separate module to get the expected status of these users. and store these states in one place (shared storage, ETCD used by Cloudfoundry) for use by other modules.
Of course, the user's expected state is usually stored in the database, the user expects the state of the acquisition module can be directly read the database or through other service modules.
This module corresponds to the fetch_desired module in the hm9000 of the Cloudfoundry.
If the execution status of the user is expected, then there is a need to apply the actual execution state, then how to get the state. This module is called the Application Real State acquisition module, the same data will be obtained on the shared storage. There are two ways to get it. One is to take the initiative to get the execution state of all applications, usually by visiting a service provided by the app. Another option is to have all the application timed to escalate the heartbeat, assuming that there is a problem with the application execution state, even if it is escalated.
Another option used by Cloudfoundry is that it is not a self-reported application. Instead, it manages all of the application's components (DEA) on a single server to escalate the heartbeat of all applications in this server.
This module corresponds to the Listen module in the hm9000 of the Cloudfoundry.
The state of the user's expectation and the real state of the application are all there, so we can start to analyze which application's execution state is inconsistent with the user's expected execution state.
The results of the analysis are then stored on the shared storage, and the data of the analysis result is used by other modules later. This module corresponds to the Analyze module in the hm9000 of the Cloudfoundry.
We all know what the state of the application and the state of the user's expectations are inconsistent. Then we will be able to put these inconsistent applications to keep them going. This requires a module specifically to send dispatch commands to the dispatch module. For example, the user expects the application to be executed, but the real state is that the application hangs up, so the module can send a dispatch command to enable the dispatch module to start up the application. The same assumes that the user expects to stop, assuming that the application is executing, then send the dispatch command to stop the application. There are fewer instances of execution in the more scheduling one up. Stop the redundant instances when you have executed more instances. This module corresponds to the Send module in the Cloudfoundry hm9000.
The application Execution Status monitoring module is basically completed through the above 4 modules, and applications with inconsistent application execution state are maintained.
Of course Cloudfoundry's hm9000 also provides other tool modules that are interested in being able to delve deeper into their own.
Three. The last module is the graceful reboot module
Why does this need to be a module? Because no matter what an application is impossible to update, unless the application is discarded. The time to update this application can be very small. Then the service is also expected due to the time that the application is unavailable during the update. And, of course, not as much as the topic says. Build 7x24 hours of high-availability applications.
In fact, this module is the simplest, but the effect is the most obvious. It's just that the features of the PAAs platform are easy to complete. Due to the application update and execution of the PAAs platform is two separate tasks. We are fully able to continue to make applications available while updating applications. And in the new update application is not fully able to provide services has been to let old services continue to provide services. Does not affect the services provided externally by the application.
This module implementation also needs to pass the dispatch module to complete. During the application of the update, we have kept the instances that were executed continue to execute. Until the updated app has been executed properly, the old execution instance is stopped. These start or stop actions are handed over to the dispatch module, the module is mainly control logic can be.
One drawback of the modules described above is that users may feel inconsistent service status when new applications start successfully and provide services to stop old instances. Since the old instance has not completely stopped, there may have been requests to reach the instance, but occasionally to a new instance, because the old and new instances may exist at the same time a very small time period, but this for most applications should be able to ignore.
Assuming that you are completely unable to tolerate this transient inconsistency, you can also do it on the route. Ensure that the user requests arrive either all to the old instance or all to the new instance, the detailed procedure is that the new instance and the old instance's routing address does not exist in the routing table at the same time.
IV, summary
The design of the three modules above is basically designed to create 7x24 hours of application. Although the design is complete, but the actual implementation will encounter a lot of unexpected situation, to be building a 7x24-hour application execution environment needs a lot of other considerations. For example, the stability of each module execution, assuming that the application execution status monitoring module is hung up and how to start the application. So good projects also need good practices, through the practice of adjustment programmes.
Copyright notice: This article blog original article. Blogs, without consent, may not be reproduced.
PAAs platform 7& #215; 24-hour usability app design