Docker-based Container cloud-padis
There are many products on the market based on the container cloud, and for peace, it is a Docker-based Padis platform. The so-called Padis, full name is Pingan distribution--safe distributed platform. Padis is based on Docker, which realizes a distributed platform within Ping an. Its implementation uses the Mesos+marathon (hereinafter referred to as mm) framework, can complete the rapid application of the creation, operation, rapid scaling capacity expansion and fault self-healing function, the platform to achieve a separate IP, can achieve any cluster and external or traditional IP communication; Platform load balancing is a lot of ways , the dynamic adjustment can be made according to the dynamic change of the container (the addition and deletion of the container), and the independent domain name resolution in the platform can be used to update the CMDB and performance monitoring dynamically.
Padis's "growth" has been mainly through the following stages:
- Development Environment Docker Standalone version
- Test Environment Padis Online
- Production environment Padis on-line
- Padis to undertake the Peace Gold Butler (formerly Ping An Life app) system Group
The following is a detailed description of each stage.
Development Environment Docker standalone version
The delivery of the Heian tradition takes a long time. From demand initiation, to resource requests, to middleware interactions, these processes travel a lot, consuming a lot of manpower and time. The delivery process, the lack of human resources and other reasons, will lead to a decline in quality of delivery. To address this series of challenges to delivery, Docker became an initial attempt. Docker contains three things (Figure 1): Warehouses, mirrors, containers. The warehouse needs little improvement in this area, focusing on mirroring. 2014 standalone Docker on-line, the first version of Padis integrated the Docker image into the OS.
The first version of Padis is based on the Docker and Docker UI. The Docker UI changed a lot of things (Figure 2), the original Docker UI had no warehouse management, no mirrored downloads and mirrored promotions, and the changed Docker UI added not only the Commit mirroring feature, but also the warehouse management functionality, as well as the ability to direct search dropdown or Upload.
Figure 2
When the first version of the Padis comes out, the focus is on customizing the image. Peace of business System is very many, middleware with something more standardized, so the main direction of customization is the standard middleware how to match, in order to put into the mirror to form a standard thing, this problem solved, then when the development environment of the image is available, the image is still available after production. At this point, the package or middleware is packaged to start the service quickly, and if you want to replicate the environment, you can copy the original environment. In this way, delivery can be done at the minute or second level.
However, how to hit the middleware into a standard image is another problem. A container is limited, if anything is hit, the log will write more, the container capacity will explode, the container default size is only 10G. The Journal of the financial industry, it must be landed, must be stored, there are some problems to troubleshoot if there is no log is very inconvenient, so the log must be retained. and put the application into the inside, there is no big problem at the beginning, but with the release of the first version, there is a very uncomfortable problem: to adjust the image, but there is no need to update the application's independent image. Therefore, it is necessary to remove all application packages and logs, leaving only the OS and middleware fixed configuration and media to be viable.
Because of the need to adjust according to some user needs, so this version of the Padis out in the peace within the promotion and to do the feedback collection. Integration of the collected information, found that a single version of the Docker has a lot of defects: first, there is no cluster, can not solve the communication between the Host, second, there is no health check mechanism, container application failure can not self-healing; third, there is no monitoring and statistical information, the use of resources is not controlled; four is familiar with Docker There are not many people in command, so it is difficult to promote use. There are a lot of different departments in the company, the focus of each department is not the same, when a department of people do not understand what this is, there is no way to use Docker, so must be interface. So follow up and do the corresponding adjustment, for the user must mirror transparent, and the Docker into a cluster mode.
There are two options for choosing a cluster mode, and the following is a brief introduction to the two framework options.
Figure 3
One is k8s (Figure 3), which is based on the Go language and the underlying CentOS based. If you choose this framework, first of all, in the Go language learning costs will be a lot of energy, moreover, the bottom of the peace is based on Red Hat Linux, so to convert Red hat Linux and CentOS is a very complex problem. In view of these two reasons, the MM frame is selected and the MM framework solves these two problems.
Figure 4
MM frame (Figure 4) simple structure, and is based on Java development, a lot of developers, choose MM Frame, not only on the above can do a lot of two development and interface, the bottom can also choose Red Hat Linux. These are the most basic problem MM can be solved. But the server resource pooling, container application association, fault self-healing, resource isolation, event-driven is also the problem that the MM framework can solve.
The use of Mesos can realize resource pooling, including the configuration and management of resources in the background. Container Application Association, you can realize the dynamic expansion of the container. Marathon framework itself will provide fault self-healing This function, ping an often do a static page, when using the HTTP protocol to access the page, will return a set of strings, if not the string, it will be considered an access exception, the original container will be deleted, and then restarted. Resource isolation is done on the basis of Mesos resource management, with tags on the back-end physical machine or virtual machine, and when resources are allocated, different resources will naturally fall to their respective servers. There are many insurance companies, and the requirements for regulation are physical isolation, so resource isolation is required. For event-driven, when Marathon does any stop in the container, it will be generated at the same time in Marathon, can register its own API sever on the Marathon, then listen to all the events, according to the return of all the event to parse , and do some of the corresponding tasks issued.
The choice of cluster mode is determined, and the next step is how to solve the communication function problem across the Host. There are some historical legacy issues, one is that the container does not have an independent IP, unable to communicate with the server outside the container cluster, the solution used at that time is to do independent IP, the second is because there is no IP, so there is no DNS, so that can not achieve cross-security zone, a lot of safe domain is through the DNS Do the load, so DNS is essential, followed by load balancing and shared storage issues. The following is a detailed description of the standalone IP and DNS solutions.
Figure 5
The first is a few questions about the independent IP (Figure 5):
EJB logic, shared storage, traditional shared storage using Nas,nas is basically need to use the independent IP to do two layer docking.
Multicast and HA, regardless of the two containers on the same physical machine, to ensure that the multicast between them are communicated.
To interface with the traditional environment, because all systems can not be moved to the platform. To interface with a traditional environment, you must communicate with server applications outside the platform, which requires an independent IP.
User operation habits, users need to log on the host to view logs, configuration and other related information. Therefore, not only to log on the physical machine, but also to log on the container, so the independent IP also played a big role.
Figure 6.1
Figure 6.2
Figure 6.3
What is EJB logic? There is a Cluster concept in EJB (Figure 6.1), assuming that there are three servers in the Cluster, 2.1, 2.2, 2.3, three servers have made containers, when the Container end to their access, the first will be called Through the T3 address (there is no domain name, the To choose the IP, in three IP randomly take one, here take 192.168.2.1). First, an EJB Create request is issued (Figure 6.1), and then 2.1 returns a cluster information return Stub message (Figure 6.2) to the client and tells the client which servers are in the middle of the cluster, and then Clien The EJB Channle create request will be issued on this side (Figure 6.3), and the EJB long connection will be established regardless of the number of servers in the cluster.
Figure 7
In the initial version of the Padis platform, there is no independent IP, and the container and the outside communication are implemented based on port Mapping, so the EJB logic will be as follows (Figure 7). So there is a problem, assuming that the address in the container is private network ip:172.17.0.2 (external not visible), and the host host IP is 192.168.2.2, when the request is sent here, because 172 of the IP is not visible, so only can connect to 192 (physical machine) IP, and then through The physical machine port Mapping the way to obtain 172 of the IP (this is in the case of multicast, will return 172 IP). However, even if the IP address is acquired 172, it is still not able to establish a connection with the outside, because 172 of the IP is not visible to the outside, so that the container can not be connected, so it must have a separate IP. One thing to emphasize here is that Docker does not support overlay.
Figure 8
Figure 8 is a brief description of Ping an Padis Network. The API Server (Python-based) on the left side of the diagram mainly includes Message center and Network,message Center registering to the MM frame, receiving all the Marathon framework's event for analytical processing, right from The top-to-bottom is the MM frame, the compute node (the physical machine loaded with Openvswitch, the container above), the gateway server (based on Linux server and Iptables implementation). As shown in procedure 8, message Center receives the event message, gets the event of the container creation, deletion, restart and other actions, and sends the corresponding action to the responding server through the task distribution platform based on Ansible.
Figure 9
Figure 9 is a simple implementation of the Docker host host. First of all the physical machine/virtual machine (using trunk to achieve the physical machine inside the container running all the VLAN) of the physical network card is added to the OVS0 through Openvswitch, the container is started with pipework to the container to specify an IP, gateway, VLAN, so that the WEB can be implemented Letter. Host host IP is 10.30.1.11 (the first version with a dual network card, wherein the ETH0 is the Management network card, ETH1 is the host host's network card, the container starts and it will be mapping), here must remember to add routes to the container, otherwise it will be due to marathon self-healing function of the fault , causing it to be inaccessible. Of course, you can also specify the relevant command at this time, automatically issued in the container to adjust the route.
In these, IP is a very important concept, so the next step is to introduce the IP management of the Padis platform.
For Padis IP Management, there are the following points:
- The IP address pool can generate network segment information dynamically when a user creates a resource pool.
- Network segment generation, gateway and VLAN information can be automatically configured on the gateway (the production environment cancels the gateway)
- The IP status setting is increased to three: reserved, allocated, and unallocated.
- Fault self-healing means that the container and IP are bound together, but the IP is also deleted when the container is deleted.
- One container, one IP, shared storage go three-tier routing.
Figure 10
Just above is the demand for IP, the following describes the need for peace on DNS.
Safe DNS Security Zone architecture is divided into three main types (figure): Web layer (show, web | | Available to internal employees, DMZ to public network or partner), SF layer (application logic layer), DB layer. In a three-tier architecture, a two-way NAT is required on the DMZ, that is, the IP address of the SF seen on the DMZ is not true, and the IP address of the DMZ is not true on SF. When doing distributed coordination, there will also be problems that cross security zones, because when you do NAT, the address you see is not real and can be accessed. In this way, when providing services externally, you need to provide IP addresses for others to enter, which can lead to a very bad user experience. For the application layer, often cross-domain problems, so the domain name is must have, and must be consistent, there is a security inside there will be some application requirements, such as to do single sign-on (SSO), it is based on the domain name, it is also required that the system must have a domain name.
For LB (load Balancing), the following analysis is done based on the collected requirements and historical experience: first, software load balancing, for financial companies, load balancing requirements are very stable, can not be an accident, software load balancer reliability and performance is not good, so not recognized The performance of hardware load balancing is certainly higher than software load balancing, but there are corresponding problems, the cost of hardware load balancing is relatively high, and some of the users of the system are not required to use hardware load balance.
In response to these conditions of LB, Ping An has proposed three sets of solutions:
- Software load Balancing HAProxy
Build a HAProxy on the container, based on the container.
Pros : You can implement basic functionality, a new version of the application routing can be done (according to different rules to distribute the route to different applications), dynamic Update configuration (through the API to change the configuration file and then Reload).
disadvantage: High concurrency ability is weak, SSL offload ability is poor.
- Software Load Balancer Lvs+nginx LVS+OSPF Scheduler free scaling, free expansion (maximum 8 units, limited by the number of equivalent routes allowed by network devices); One Lvs backend can build multiple Nginx.
Advantages: High concurrency capability, can be easily completed dynamic configuration updates, SSL offload capacity is good (LVS can build more than nginx,nginx, unloading ability naturally improved).
disadvantage: when the back-end Nginx changes frequently (for example, from 10 to 20), the routing jitter occurs when the equivalent routing algorithm is performed.
- Hardware load Balancer LTM
has done the automation configuration development, has done the dynamic interface configuration.
disadvantage: Its ability to expand is not good, high cost.
test Environment Padis online
Figure 11
Figure 11 is a test environment framework based on the above problem. First provided to the user to see the Portal (based on JS write), the following API is based on Python-written. Next is the MM frame, log is the journal cloud, and the log is next to the authentication module and the hardware load balancer LTM. The red block in the picture is the warehouse, the image of the development environment is all in the warehouse, the image includes the user authentication configuration, the basic environment tuning and so on, the warehouse below is the storage, the storage mainly uses the Nas. Now the container support virtual machine and physical machine operation and maintenance concept and development, the fewer nodes the better, so that the fewer points of failure, so the main computing nodes of the platform are mostly physical machines. Finally, the Gateway is set up on the network module.
Figure 12
Figure 12 is a logic. First enter from the Potal, call the API to create a network segment or application, the API is divided into authentication, DNS, Traffic (LB), network, Cache modules, then the idea is to pack the database together, but because the safe library is too large (frequently several T), so give up Such an idea. Padis API can be user authentication, DNS of the increase and deletion. As can be seen from Figure 12, all modules are independent of each other, all the systems come in through the Portal, and send requests to API,API according to different requests to do the corresponding operation, all the operations will be sent directly to the Marathon framework, Marathon framework will also be based on the event to the API corresponding feedback , the API will also do the corresponding operation according to the feedback results.
production Environment Padis on-line
The requirements for the production environment are high compared to the test environment. For the production environment, the following modifications are mainly made:
- The performance of the test environment is not up to the requirements, the stability is problematic, so the Gateway Server is canceled, all in the production environment to use network equipment.
- CMDB, the test environment is its own independent, production to the traditional CMDB docking, data needs to be written into the traditional CMDB inside, so need to develop a CMDB interface, data import.
- Monitor alarms. Need to build Zabbix monitoring platform, docking the traditional monitoring platform.
- Performance data collection. When doing fault analysis, it is based on the analysis of performance data, and then the results, and based on performance data, to determine the capacity, such as whether to do some dynamic expansion and contraction capacity (about the initial idea of capacity expansion is made automated, but the problem is, if automated, when others attack, Background resources are not necessarily able to carry them.
- DNS Retrofit. Docking the DNS of the production network, Implementing Dynamic Update DNS (in this place dedicated to write a DNS module, with the peace of the DNS docking, the internal DNS will do a node, on the above do a separate configuration, all the configuration updates are done inside).
- Log storage. The test environment uses CEPH, does not use a local disk, and makes all the spare physical disks a ceph, but the ceph used in the test is immature, so it becomes a NAS, and logs are connected to the log cloud for analysis, processing, and archiving.
Figure 13
After the above transformation, the production environment framework becomes 13 (the CMDB module is added).
Padis to undertake the safe Financial Butler System Group
Ping An Padis in the first half of this year to undertake a safe financial housekeeper (formerly Ping An Life) system group, then the system group used platform, is the Padis platform.
The traditional environment, from 0 to the fastest 3 days on the line, but Padis completed the system from 0 to on-line only 5 minutes of this process, including all the debugging links. Padis then made an Internet export and a LVS + OSPF retrofit, and was serviced by 50 Nginx and multi-channel LVS. Because all the configurations were standardized at the time, the packages were uploaded and then immediately available, so it took a short time. That was also the first appearance of Padis in the production environment, after this time, summed up an experience: the interaction between the isolation group and other systems, not because of activities to affect the normal operation of the system.
This paper
Wang Yaowu
2008 for the first time to participate in the work, 2010, joined the Peace technology infrastructure, since then was branded middleware, 4 years of time never know what middleware is, to grow into a middleware expert. has experienced several major projects and reforms of peace. 2014 began "do not work", to do Docker on the landing platform, set up a team to build a rapid deployment platform Padis.
More container Cloud practice articles can be viewed in the Seven Cow Cloud blog .
"Container Cloud" Docker practice in traditional financial enterprises