This is a creation in Article, where the information may have evolved or changed.
"Editor's note" Growingio is a new generation of data analysis products based on user behavior, providing the world's leading data acquisition and analysis technology. Businesses can acquire and analyze comprehensive, data-driven users and revenue growth without having to embed sites or apps. In order to cope with the rapidly changing business growth, we have adopted a micro-service architecture at the beginning of the system design to achieve good scalability. With the passage of time, the shortcomings of micro-services are gradually reflected, the cost of human operations is too high, resulting in a decline in research and development efficiency. As a small and medium-sized team, we have been exploring for six months, using container-related technology to build a private PAAs within the team, and achieved good results.
As a start-up company, we are faced with rapid changes in business and rapid growth of scale, in order to achieve good scalability, the team at the beginning of the system design has adopted a micro-service architecture. But as time goes on, the drawbacks of microservices are becoming more and more evident:
- The system architecture is too complex, and the cost of operations will gradually increase over time.
- Traditional operation and maintenance methods are not reproducible, and the cost of environment migration and scale-out is too high.
- Low resource utilization, a large number of proprietary machines.
These problems can lead to a decline in research and development efficiency. In order to solve these problems, after six months of exploration, the team used the container-related technology to run the application of micro-services, and achieved good results. Before continuing to share, let me briefly introduce our technical background, we are a big data start-up company, the entire technology organization structure is divided into three layers, namely front end, service side, data side. The front-end uses the react framework, the service side and the data end use the Scala technology stack, the service side takes the micro-service architecture.
Take the first step
The first step in containerized is of course to package the application into a single image, and we have tried two scenarios. The first is the more common practice of putting the build environment into a container, such as the dockerfile of the image below. The benefit of this scheme is that it can reproduce an image completely, because there is no external dependency. The downside is that it's very slow to compile for very large languages like Java and Scala, especially if Scala itself is not compiling very fast, which is too heavy for us.
The second option we are now using is to package the host, mirroring only provides the underlying operating environment. The advantage of this approach is the same speed as the native compilation, the disadvantage is that the host environment will have a certain requirements. Because our team is small, it is entirely possible to fix a few packaged machines, but if developers are going to have to put their applications in a container, they have to sacrifice a lot of time to compile, which is certainly unacceptable.
Manage Configurations
The first problem that comes up when you apply a container is definitely a configuration management issue. Summarize the two scenarios we've tried:
- Use a dedicated configuration management center, such as Etcd,consul. Applications need to write configuration management logic that requires additional development effort. It is awkward to switch edits back and forth using the interface after the application and environment are many. The benefit is the ability to make real-time configuration changes.
- With environment variables, most languages and tools provide support for environment variables, such as Typesafe config and openresty conf, even if they are not supported and can be overridden by entrypoint scripts to render the configuration file when the app is launched. You cannot update the configuration after startup, and you will need to restart the app after the configuration changes. The advantage is that the workload is relatively light, combined with the specific Orchestration tool can easily manage the configuration of different environments.
Finally, we combine the advantages of both scenarios, Consul management requires hot-updated configuration items, and most of the fixed configurations are injected through environment variables.
Select an Orchestration tool
When we first started testing containers, most of the time developers wrote scripts to control the distribution, creation, and destruction of containers. It was not long before we found that a unified orchestration tool was needed to standardize operations and reduce the wheel-making situation. We look at the industry's most popular three orchestration tools, they are the official Docker support Swarm, from Google's kubernetes, Mesos resource Scheduling framework marathon.
The choice of tools is a very subjective thing, no matter how good the idea of an open source project itself, ultimately it will be in fact hand over to know and their actual situation is not matched. Here we crawl some of the three items on GitHub, combining our own practice to make a subjective assessment of our team's perspective.
Swarm: The simplest architecture, the fastest deployment, and the increased visibility with Docker's official support. The downside is that the Docker team has just put into research and development for a long time without mass production testing. While the latest swarm and Docker are bundled together, upgrading is a problem.
Kubernetes: The most dazzling star in the past year has certainly been the focus of kubernetes,kubernetes and his competitors are not at a level at all. Kubernetes's philosophy is ideal for microservices architectures, Pod,service,cluster Ip,replicationcontroller. However, we find that the complexity is too high, the components are too scattered and the debugging is very troublesome after the actual use. Last year Kubernetes introduced the Kubeadm, which greatly reduced the complexity of the deployment. But there is still a gap between ease of use and swarm and marathon. Another reason for us to give up kubernetes is that Kubernetes's support for big data is not enough, and in the long run we need to become more containerized in data applications. Although it has recently been heard that the community has kubernetes-mesos this project, which attempts to combine the merits of both, it is not struggling at all.
Marathon: Scheduling across clusters is more complex than scheduling problems with a single host. The scheduling of a single host focuses on how to run as many threads and processes as possible on a few CPUs, ensuring that a single process does not run too long and that the process can hit resources. The scheduling problem in distributed background is much more complicated, because the network interaction between hosts is delayed, and kubernetes is different, Mesos is based on resources, and the focus is on resource allocation. Mesos itself only maintains cluster resources, and then decides which service framework to allocate resources based on the advantage Resource Fairness algorithm. So Mesos does not care about the details of the allocation of resources, the details are given to two scheduling framework. Marathon is a two-time resource scheduling system built on the Mesos, marathon architecture is very simple, is a typical master-slave architecture, itself relies on zookeeper implementation of HA.
The development time of Marathon was the earliest and was developed in 2013. At the same time is also the earliest to reach the production available, the use of large companies marathon is the most famous is the Twitter, stability is guaranteed. Another important reason is that mesos and big data applications such as spark integration are very friendly, mesosphere recently launched a marathon-based DCOs that captures the selling point of running business systems and big Data systems on the same cluster. This is very beneficial for cost compression because the resource utilization of these two types of systems is just the wrong peak, and the business system generally reaches the peak of the request during the day, while the big Data system starts to calculate the day's data at night. At the same time, marathon itself is written in Scala, the UI is written by react, the previous technical background of our team, and our technology stack is very consistent, conducive to two development.
Marathon feature Introduction
Deploying an app on marathon is simple, you can choose to edit it on the interface, or you can use the rest interface it provides. The actual test of the UI is still a bit of an unstable bug, although it is intuitive to operate on the UI but must not be able to use the bug's UI in the production environment. So we chose to use a script to control the app's updates, and use the JSON file to store the app's definition.
Here is a hole to mention, novice in the deployment may often encounter the status of the application display waitting. This is because Mesos can not respond to marathon resource offer, this time to check the resource configuration, such as the machine's CPU is not enough, another common reason is Mesos port as a resource, if Mesos The slave node does not declare its own port range at startup, and it is likely that the specified port is not in the default port range. So you need to specify the port resource range at boot time.
--resources='ports(*):[1-32000]'
。
Constraint function
To prevent the application from drifting freely in your resource pool, marathon provides a constraint function. Hostname is a common constraint that allows you to specify that an application runs on a machine with a specific IP. This is useful for the time when the team is transitioning to containerized, and we just ran the application that was originally deployed on the machine in the container, and the other conditions were the same.
The other constraint is based on lable, and you can specify the Lable collection for this slave when Mesos slave is started. This allows us to specify a specific lable at the time of dispatch. , specify that the application runs on the machine with rack number 1-3.
Elastic Scaling
Elastic scaling is based on business needs and strategies, automatically adjust its computing resources, to optimize the ability to mix resources. Increase computing power as the volume of traffic increases, reducing computing power as the volume of traffic decreases, ensuring stability and high availability of business systems, while conserving computational resource costs.
Elastic scaling is also divided into horizontal and vertical scaling, horizontal scaling refers to the addition of more machines, enhance the computing power of the application to support the increased traffic. Vertical scaling refers to the enhancement of the configuration of a single machine, such as upgrading the CPU cores, or increasing memory, the scalability is limited.
Stateless applications, such as general Web applications, store data in database and cache middleware and do not rely on specific machines after being containerized, an application that is well suited for horizontal scaling. The most important feature of the Orchestration tool is the ability to provide this level of expansion, with the constraints described earlier, just change the number of instances to achieve horizontal scaling.
Service discovery
Service discovery usually has two practices, the first of which is DNS-based implementations. The second is to hang a proxy in front of the service. The corresponding marathon distribution provides a total of two tools for Mesos-dns and marathon-lb. MESOS-DNS provides service name and IP port number lookup service, no load balancing function. The MARATHON-LB monitors the Marathon event bus, providing TCP load balancing based on the port number.
Because DNS needs to implement load balancing on its own, we have chosen the LB scheme. LB itself is a haproxy, and by listening to the marathon event bus, LB dynamically updates the configuration file to achieve the purpose of service discovery.
The above mentioned LB is port-based load balancing, which means that lb itself is using ports to differentiate between different services. Open LB's 9090 port to see load balancing information for LB, including external service ports and the actual IP and port number of the corresponding background service.
LB obtains the specific IP and port of the service through marathon, that is to say we want to ensure that marathon can get the port of service correctly, we will specify container port and host port in the bridge mode, but there is no mapping information in the host network mode. The port number required for the claims service that we are displaying, which is defined by port, also indicates that we need this resource.
Health Check
Application health Check is a very important function to ensure the stability of the service, usually to write a script to poll the health check of the interface, if the interface is found to return an exception to take appropriate action, such as restart. Marathon also provides a similar health check function, which is very powerful. Supports a variety of protocols, including HTTP/HTTPS/TCP, or you can write your own shell commands to check. Supports setting the startup time of the app, which ignores health checks during this time period. Support time-out settings, check the interval settings, and the maximum number of failed settings.
Health check is only the first step, the important thing is how to deal with the results of health examination, Marathon provides a very complete set of coping mechanisms, is marathon for the different health status of the cluster to take appropriate measures of the state transfer diagram.
Simply explain:
- I represents the number of instances set, R represents the number of instances running in the cluster, and H represents the number of healthy instances.
- "R<i" means that the number of instances running is less than the number of instances set, and Marathon is expanded.
- "H!=r ^ r=i" indicates that the instance runs at the same number and number of settings, but there are unhealthy instances in the instance, so marathon destroys unhealthy instances, when the cluster reaches the "R<i" state to expand.
If the expansion is very smooth and all instances are healthy, then the cluster will achieve a healthy steady state of the entire cluster "r=i ^ h=i".
Steady state is not static, if the running instance for a variety of reasons become unhealthy, that is, "h!=r", then Marathon will also be destroyed operation to eliminate unhealthy instances, the cluster again into the expansion state.
Update policy
Marathon can adjust the ratio of the minimum and maximum online instances to the preset values by setting the mininumhealthcapacity and maximumovercapacity parameters in the Upgradestrategy. Assuming you have a Web application that you want to maintain a certain percentage of healthy instances in your cluster during the upgrade, you can set mininumhealthcapacity to a fraction greater than 0 and 1, and Marathon will ensure that there are healthy instances in the cluster that are greater than or equal to this scale. Another problem that may be encountered in the test environment is that many applications will have a failure to start, and if you do not take action, you will soon be filled with a large number of containers that are already dead. Marathon provides a restart delay feature to avoid these problems, by setting the Backoffseconds and Backofffactor parameters, we can control the restart interval of the application.
Basic facilities
Before being containerized, the team used Zabbix to do the monitoring and alerting of basic services, and some error logs were also directly crawled through the Zabbix agent. Zabbix Although some puzzling small problems often occur, but overall is a set of low-cost solutions available. In the last two years, many new operations tools have emerged, and this generation of tools is often designed to follow a single responsibility, collecting, filtering, storing, displaying, alerting, and having a variety of options available at every step. This monomer application is much more flexible than the Zabbix.
The log scheme uses a traditional Elk technology stack, in which the container log collects this piece using a gadget: Logspout to forward the container log. Logspout the stdout and stderr information for all running containers by listening to the socket address of the Docker Deamon, which itself does not store the log, but is accurately a log routing tool that can forward multiple destinations to the same log. The logspout itself has an extension mechanism that can write its own plugins.
StdOut and stderr all logs as events, and then centralized processing through a unified log processing system, the benefit of which is the ability to borrow powerful search and analysis capabilities to quickly get the information you want. The downside is that if the log is improperly handled, it is likely to cause a large amount of spam, but it hinders troubleshooting. Therefore, a valid log message should contain the following aspects:
- Level: Information class Info\warm\error
- Host: Server IP
- Application: Application Name
- File: code file name
- Line: The first few lines in the file
- Date: Time
- Content: Contents
Indicator monitoring tried Influxdb and Prometheus, both are very easy to get started, influxdb the cluster version is charged, compared to Prometheus to take support of third-party distributed storage practices, while supporting the alarm, open source community is also more active, So I finally chose the Prometheus.
Prometheus takes the pull pattern cycle of pulling the data out of the export, exporter itself is very simple, only responsible for the data into the Prometheus format exposed to the port. So the Prometheus and exporter are completely loosely coupled, and any component that hangs does not have an additional impact on the existing system. Prometheus itself provides a lot of exporter that can monitor information such as containers, hosts, and JVMs. For us it is not very desirable to install too many exporter,influxdb on the node to provide Telegraf that solves this problem well. The Telegraf integrates a very great number of monitoring sources while supporting Prometheus as the output side.
The Prometheus function is very powerful and can support multiple numerical operations of the indicator. The query aspect provides a set of DSL namely PROMQL, supports the filtering of any dimension, the filtering rule supports the regular expression formula. Even we can use a variety of functions, the indicator of the rate, sum, rounding and other operations. Prometheus's query function is very powerful, itself also has a simple UI, not as a monitoring system of the presentation layer is too weak. At the same time, we also need to aggregate each system fragmented monitoring information, so that when troubleshooting problems only need to view a platform, do not need to switch back and forth waste time.
Grafana is a dedicated open source software, Grafana itself support plug-in mechanism, can be through the community a large number of open source plug-ins to complete information integration. (Recommended to use GRAFANA-XXL, can save the process of installing plug-in) The following is the Grafana Integrated Prometheus monitoring interface, while Grafana support elasticsearch integration, so that the indicator information and log information can be centralized into a kanban.
Alarm
Alertmanager is a Prometheus provides alarm management components, Prometheus periodically crawl data, after the completion of the crawl will check whether there are alarm rules and calculations, to meet the alarm rules will trigger the alarm, sent to the Alertmanager. Configuring an alarm rule is very simple, using an expression of the indicator, you need to specify the maximum duration of the triggering alarm, and you can set the label of the alarm itself to facilitate subsequent routing rule settings.
Alertmanager support the routing of Alarm rules, you can define multiple routes to send different labels of the alarm classification, support alarm Silent rule configuration, Alarm channel support mail, SMS, Webhook. Our most common is webhook, the IM tool of the access team can receive real-time alarm information.
The following is the overall architecture of the monitoring system:
Summarize
Briefly summarize some of the experiences our team has gained in the process of containerized exploration:
- In the selection of technology to combine their own actual situation, the most popular is not necessarily the most suitable.
- Do not blindly believe that a tool can solve all problems, to set aside a retreat for themselves, it is best to choose Open, extensible tools.
- After the application is containerized, there are some negative effects, and the team needs to change the previous operations strategy, if the infrastructure is not in place, do not rush to production.
Q&a
Q: I have a few questions to ask: 1. Is the underlying virtualization used? 2. Rancher is not considered in the technical options? 3. Can I set the health check dynamically?
A: The bottom of the use of virtualization, considered rancher, but at that time too early rancher very unstable, and then gave up, a few days before the line with their team exchange seemingly rewritten, it is worth trying. The health check I is logically, the health check itself does not have this configuration item, I can be adjusted by the scale function at any time.
Q: Do you use the CI/CD tool? How does it run through the entire application cycle?
A: The problem is too big to open another head. We use Jenkins to do CI, the production of CDs are also available, the release of the time are manually posted by people staring.
Q: We are currently using your products, as long as we use to count the Android and iOS user behavior data, we also have their own containers, and the Orchestration tool is Kubernetes, the other is stable, is occasionally a resource distribution when the state of different clusters inconsistent, Can you use marathon to replace the solution, is not the price a bit big?
A: My advice is to continue to use Kubernetes,marathon is not completely without bugs, the use of the same will have pits in the inside, kubernetes development is very good, but marathon for us more appropriate.
Q: Question 1. Is the database useful to the container? What to pay attention to? Question 2. Production environment image Update, mirror large, but only one file at a time, what good advice?
A: Question 1. The test environment is useful to the container of the database, mainly the deployment is more convenient. Production is not used, or performance problems it. Question 2. Base image to choose, do not use the container as a virtual machine, the JVM language packaging is large, can also be controlled around 200M.
Q: Are you packing a different image for each environment? or using the same image? How does configuration management work? Can I configure dynamic loading?
A: It is not a different image for each environment, it is the application itself to the environment requirements of the option to make dynamic, through the environment variable injection. Dynamic loading requires your own authoring tools to be implemented through consul.
Q: How do you develop a program that is micro-service? How to use containers to improve the efficiency of research and development?
A: The topic of micro-services is also very large, when we split the service will be repeatedly thinking about whether it is necessary to split the benefits of what. For example, we split the authentication piece into a single service, which is split according to function. There are also splitting of different resources to improve the performance of the split. The biggest benefit of containers is the packaging environment, which enables rapid deployment.
The above content is organized according to the March 7, 2017 night group sharing content. Share people
Growingio, a research and development engineer at the end of the service. Engaged in Growingio core system development. DevOps-obsessed with technology that can improve research and development efficiency。 Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.