Dockone WeChat Share (120): The practice of private container cloud construction based on Kubernetes

Source: Internet
Author: User
Tags etcd docker registry fluentd
This is a creation in Article, where the information may have evolved or changed.
"Editor's note" This sharing will introduce ePRO to pay private container cloud from 0 to 1 of the construction road. Includes technology selection, theoretical Foundation, kubernetes-based container clouds and CI/CD in the landing process challenges and trampled pits.

Construction Background and objectives

Before the popularity of Docker technology, ensuring the quality and speed of software delivery is difficult for most businesses. The complexity of the business brings application complexity, and in the face of thousands of different applications, operations departments need to respond to challenges from different applications and environments at all times. Especially in the enterprises with low degree of automation operation, "human operation" has become a common means of solving problems, and human operation makes the software delivery cycle become long and the risk of man-made accidents increases. In 2013, Docker turned out to be a "Build once, Run anywhere" feature that refreshed software delivery. After careful research into Docker technology, we decided to build our own private container cloud with the following background and goals:

The realization of operation and maintenance automation is the most important goal of our project, and it is the foundation of realizing the later goal. This factor directly determines our technology selection.

Technology selection

We are in June 2015 to start research technology, the beginning of the August 2015 Container Cloud project, the first problem is how to carry out the container layout engine selection, can choose to have swarm,mesos,kubernetes, or even independent research and development of cluster arrangement, We carefully investigated each of the options:

Swarm at the time was 0.4, the function is relatively simple, the advantage is that the technology stack is relatively simple, small team can control, but considering that it is not stable version, although it developed quickly, but did not solve our existing problems, so swarm is not considered a priority.

Mesos was the 0.23 version, which was able to handle large-scale scenarios of container orchestration, focusing on resource abstraction, inconsistent with most of our Java Web application scenarios, and the Mesos technology stack is too different from our existing technology stack to abandon this option.

Self-developed container orchestration engine We have also considered, but after careful discussion, self-research orchestration engine to the standard three open-source components of the function, research and development investment needs a lot of costs, may not be able to achieve the expected results, low input-output ratio. In addition, the container cloud as the underlying infrastructure, choose more cautious, if the self-research project failure, may be far away from the mainstream container technology, the opportunity cost is too high, so the self-developed route is also denied.

Kubernetes is our final choice, it was 1.0.2 version, is already "Production ready", we choose kubernetes the most important reason is its advanced concept, and is very suitable for our company's mainstream applications, Java The web app is a long time running task, and Kubernetes's "Replication controller" supports it very well. Kubernetes's application-centric philosophy and community activism were a strong choice, and the three-month technology selection finally came to an end, and we decided to use Kubernetes to build our private container cloud platform.

Theoretical basis and principles

After we decided to use Kubernetes as the container orchestration engine, the controversy about the selection lasted for a long time, when the domestic kubernetes users were relatively few, and it was difficult to find a successful case. We need to delve into Docker, kubernetes related container technologies to ensure that our decisions are right, which is critical to building our container cloud. After a lot of research and discussion, we find that the container cloud is supported by a complete set of theoretical foundations that derive our principles for building a container cloud:


    • Immutable infrastructure that uses the immutability of Docker images to maintain infrastructure in a more convenient way: When infrastructure is damaged or changed, it is done in a direct replacement, rather than by repairing the damaged infrastructure, and the cost of replacing it is low enough that Docker obviously does. For a Docker container that is already running, if it is an exception, it is no longer the traditional way of SSH debugging, it should be to kill the container, restart a new container, the replacement operation has a fast and repeatable characteristics, any operation can be rolled back at any time, safe and reliable, for the production environment of operations, The concept of immutable infrastructure is particularly important, and many accidents are caused by direct modification in the production environment.

    • Infrastructure as code, management infrastructure like management code, every infrastructure is "descriptive", such as node concept in kubernetes, and they should be managed as part of the code.

    • Programmable infrastructure, infrastructure not only to provide computing, storage, network resources, but also to provide a programmable interface for upper-level applications, so that the upper-level applications can more flexible use of infrastructure, container cloud from the beginning of the project to consider this, the container cloud platform has a complete set of external restful API, Available for upper-level applications, even external applications.


Ensuring that the process of building the container cloud is done correctly and requires some principles, "build Once,run anywhere", a Docker image that runs through QA to every aspect of the production environment, and does not allow inconsistencies between QA and production mirrors. "All in one", for Java Web Apps, for historical reasons, multiple Web apps may run in the same tomcat, requiring only one web app per docker image.

Application-centric, is our most important principle, but also the starting point of building a container cloud, this principle to ensure that our focus is the application, rather than the abstraction of computing resources and resource scheduling, our ideal goal is to "gracefully" manage the entire life cycle of the application, by the way, to do a good job of resource abstraction, Increase the utilization of resources.

Hierarchical governance, the governance of infrastructure is done by the container cloud, the governance of the upper application is the responsibility of the application governance layer, from SaaS, to PAAs, to CaaS, layered governance, each layer through the interface calls each other, layer and layer between the non-intrusion.

Building a container cloud with Kubernetes as the center


The goal of the container cloud is that we are faced with the management of the application, that is, the management of the corresponding Docker container, which requires us to build the container cloud as the center of Kubernetes, rather than Docker. Docker is used only as a tool for application packaging, delivery, and runtime, and all APIs are designed for Kubernetes.

The container cloud is a highly available infrastructure capable of supporting multiple data centers. For the application, to have multi-dimensional high-availability assurance, to through the deployment pipeline, through the CI/CD to achieve rapid delivery, in addition, the container cloud construction shoulder The additional goal is to provide for the next 2-4 years of technology development to pave, Lay the groundwork for the cloudnative transformation of applications and the DevOps practices of the entire technical team.

Container cloud The first step is to achieve full lifecycle management of the application, enabling applications to go live, roll back, upgrade, expand/shrink, and offline. For historical reasons, some application configurations are coupled to the environment, and some applications are hard-coded for external dependencies (such as the IP address of the service party), and these applications need to be transformed before migrating to the container cloud.

Container cloud to enable multi-datacenter multi-live, to ensure high availability at the data center level. For elastic capacity expansion, our plan is to realize the manual expansion, and then realize the automatic expansion; For automatic expansion, the automatic expansion based on Cpu/memory is realized, and then the automatic expansion based on custom metrics is realized. Unlike most of the ways to build a container cloud, we first address the operational automation of the production environment, followed by the container build problem (i.e., CI/CD). Our network selection is flannel, million trillion network, flannel although there is performance loss, but far to meet our actual needs. Storage we use Ceph's RBD way, for more than a year, RBD's program is very stable. The way Ceph FS is we have tried, but it has not been formally used due to limited team effort and possible risks.

Highly Available infrastructure

Container cloud to achieve a highly available infrastructure, multidimensional assurance of high availability of applications/services:

At the application level, there are at least 3 copies per application, guaranteed by Kubernetes Replicationcontroller/replicasets. Forcing each application to expose the health check interface, by setting liveness and readness to ensure that the application exception can be found in a timely manner, thereby replacing it with a new instance.

Kubernetes components are also highly available, especially if the ETCD cluster is highly available, and it is a good practice to regularly back up ETCD data.

To ensure high availability at the datacenter level, we deployed a suite of kubernetes clusters in each data center, each capable of surviving independently, and multiple data centers being prepared for each other.

Compute Resource QoS and oversold


Due to resource constraints, technicians tend to focus too much on single-machine resource utilization. The mechanism of resource sharing and isolation provided by Docker (Cgroup, Namespace) gives us a new understanding of resource utilization, especially when using the container orchestration engine, our understanding of resources should be taken into account in the cluster dimension, not the utilization of single machines. Similarly, the overall consideration of resource utilization across data centers and even multiple data centers is essential.

While improving resource utilization and reducing costs, there is a need to balance the QoS of the service with the optimization of resource utilization. Our principle is to ensure the quality of services at the same time, to maximize the utilization of resources.

According to the Kubernetes resource model, QoS at pod level is divided into three levels: Guarantee, burstable, BestEffort, and we also set the standard for resource oversold according to the priority of the three levels corresponding to our application.

Our QoS standards for app settings:

    • Kubernetes comes with a component that uses guarantee

    • Important components and applications such as zookeeper, Redis, User services, etc. use guarantee

    • General application (burstable) in accordance with the importance of classification, by the importance of the CPU is divided into 2,5,10 three oversold standard, 10 times times oversold for the boss background class of applications, most suitable access is not high. Memory uses a fixed 1.5 times-fold oversold standard.


It is important to note that in a production environment, do not use besteffort in a way that raises indeterminate behavior.

Container Cloud Management Platform


As more and more applications migrate to the container cloud, a visual management system is needed, and we use the Kubernetes native API to build a Web management system that namespace/resourcequota/deployment/service/ The call of API such as endpoint realizes the partition of resource quota and management of application life cycle.

The biggest challenge for the container cloud platform in terms of ease of use is the troubleshooting, the container cloud is ultimately delivered to the developer, and they don't know about kubernetes, which makes troubleshooting a challenging process. Now we just want to show the KUBECTL exec console to the user through WebSocket, or let the user view the log in the Log Center (EFK), there is no better solution, if you have a better plan, please do not hesitate to enlighten.

The future of the container cloud is to visualize the entire data center, keeping operations at a glance at the real-time performance of all data centers, and of course achieving this goal is quite difficult.

The monitoring of the container cloud uses the Heapster scheme, which is changing to the Prometheus Way.

Log collection is a combination of mainstream EFK.

The basic functions of the container cloud management system are as follows:


The log collection scenario looks like this:


We have provided a common log component for Java applications,--appenders, which will output the Java log stream to Fluentd, and the output to FLUENTD relay because it runs in parallel with the existing log hub. The other parts are no different from the mainstream EFK model. It is also a good choice to use Daemonset to run Fluentd and Fluentd with the app in a sidecar way.

In the container age, cloudnative application is the inevitable choice, the principle of building cloud-native application, please refer to 12 factor.

The container cloud management system itself is also a cloudnative application, it also runs in the kubernetes, unlike the traditional on-line tools, it can be self-lifecycle management.

Container based, mircoservices oriented is the cloud native initiative, only the application to the cloud native transformation, to better play the effectiveness of the container cloud.

CI/CD Construction


According to our pre-roadmap, we first liberate the operation and maintenance of production environment, and then solve the problems of application construction and integration. Now, the container cloud management system basically replaces the daily maintenance manual operation, the frequent manual triggering constructs the container cloud advancement bottleneck, therefore, constructs the CI/CD platform to become very urgent.

After a preliminary investigation, we decided to build the CI/CD platform using the Gitlab + Jenkins + Docker Registry technology stack. In order to unify the technical standards and minimize the uncertainties in the construction process, we adopt the automatic generation of dockerfile rather than let the development itself write the dockerfile. We use a stable backbone approach, where Mr Automatically triggers the build process, after unit testing, packaging, compiling, and Docker building, and the container cloud interface shows the build process in real time, and after the build is finished, the user receives a message that the results are built. Finally, the CI output docker image is pushed to the registry of the QA environment.

For us, the most important and difficult part of CI/CD is automated testing, especially automated integration testing, which we are trying to solve.

CI process We also do the code of the dependent library check, code version tracking and Docker image self-description, etc., so that the Docker image from the start, in testing, production testing, production and other aspects are traceable. This makes it easy for us to find problems and make continuous improvements to the CI process.

Standardization of common technology stacks and configurations is also an important goal of CI construction. To ensure the quality of the image of CI output (similar to defective rate) is an important criterion for CI system evaluation.

is the workflow of our CI/CD platform:

Shows the entire deployment pipeline, mirroring the whole process from build to production deployment, and feedback on the process and results:

The problems and challenges encountered

So far, the process of building the entire container cloud has been a little technical challenge, but it has been a bit of a hole.

Encountered the RBD disk is locked, the newly generated pod can not be mounted, the solution is to manually unlock the RBD disk, the new pod will be automatically mounted.

Kubernetes a bug,kubernetes replicasets name is generated according to deployment Podtemplate, using the Adler algorithm, the hash collision is very frequent, will during the upgrade process, Deployment cannot create the latest replicasets and cause the upgrade to fail. The solution is to talk about the Adler algorithm into the FNV algorithm, to reduce the frequency of hash collisions, which is obviously not the ultimate solution, the final solution is also ongoing discussion, interested friends can participate: https://github.com/kubernetes/community /pull/384, https://github.com/kubernetes/... 29735.

Since we have not had time to migrate harbor, we have been directly using the Docker Registry 2.1 version as the private image warehouse, when using the RESTful API, _catalog returns the first 100 images in alphabetical order by default, and the client needs to handle paging problems.

Application to Container cloud migration is the most energy in the container cloud construction process, due to the need to adapt to the concept of the container cloud behind the transformation and transformation of existing applications, the migration process has been a lot of challenges, the biggest challenge is to Dubbo application migration problem, Since flannel's overlay network makes containerized Dubbo applications not connected to applications outside of the overlay network, we finally modified the network strategy so that Dubbo applications can be seamlessly migrated to the container cloud.

The next stage of container cloud work is focused on driving applications to cloud native and MicroServices in the direction of transformation.

The biggest challenge for the container cloud comes from a change in concept, the container technology that changes the ecology of software delivery, the need for technicians to build applications in new concepts, and how to make it possible for technicians to successfully complete the transformation of ideas that each container cloud builder needs to consider seriously.

Q&a

Q: What is the granularity of cluster automation deployment when dealing with CI? For example, fix a bug to change a class file, and then after the local test to deploy to the online AB test, then directly through the CI automatically deployed to the cluster server?

A: Our approach is to trigger a rebuild whenever there is a change, which is only appropriate for our company, and you can make a granular choice based on your situation.
Q: Can you tell me more about the problems and solutions that Dubbo application migration encounters?

A: The solution is rough, for Flannel,container can go out, but outside the request into the container, because our physical machine size is limited, we configure the static route, as long as the REACH node can find container on the line.
Q: What is the purpose of automatically generating dockerfile? What are the advantages of doing this?

A: There are two aspects of this problem, the first is standardized standardization of the problem, our application is mostly Java Web, the similarity is very high, so it can be automatically generated, there is no special need to develop their own writing, in addition, we in the CI platform, set aside the editor Docker, It can also be written for special situations, but this is a very rare situation. CI is closely related to the business situation of each enterprise, or specific analysis of the situation.
Q: I got some pods, I have a service, and then I want the pod to implement a single task, but the problem is that the service is random to the pod selection mechanism, which means that there may be multiple task requests to a pod, not to my requirements, how to solve?

A: I personally understand that the problem you want to solve is not quite consistent with the scenario where a service corresponds to multiple pods, and I recommend that you consider other implementations, such as multiple sevice-pod combinations, etc., or consider other ways.
q:"kubernetes Master High Availability" How to design? " are stand-by relationships in the middle of multiple data?

A:api server is stateless, can be deployed multiple, front-end load balancing, Scheduler/controllermanager state can be made into the main preparation. Kubernetes is still stable (of course we have a small amount).
Q: What is the kubernetes version used by your company? RBD locking problem can only be solved by manual unlocking? Do you have any other plans?

A: We are online earlier, production system or 1.2 version, we are upgrading version 1.6. RBD I only tried the manual unlock method, other methods did not try.
Q: Would you like to ask about your choice of kubernetes distributed storage, and what problems are encountered in using it?

A: We do not use a hanging disk, so there are not many scenarios for using Ceph. There are no problems with RBD, there are scenarios where we need shared storage (Filesystem) because we have limited hands and no energy to try Ceph FS or any other way, that's a problem.
Q: What is the purpose of Kafka existence in EFK architecture? Do you guarantee that the logs are not lost or increase throughput?

A: Mainly to do buffering buffer, we also have an individual log in the process of processing, from the Kafka out of processing, and then into Kafka, and finally to Elasticsearch.
Q: Can you describe in detail the important criteria for the evaluation of CI System: Standardization of common technology stacks and configurations. What are the specific metrics that have been standardized? How is technology achieved?

A: For Java applications, we only provide JDK 7 and JDK 8, specify the location of the log directory, provide standard log4j, configuration and code separation, war package and environment, regardless of the mandatory requirements.
How is mirrored replication of Q:docker registry implemented?

A: Run script, Docker save, SCP, Docker load, this practice is relatively low, it is recommended to do it directly with harbor.
Q: How is dockerfile automatic generation implemented?

A: According to the template generated, such as the Java Web, in the provision of good log output directory, and so on, the variable is only a few parts of the project name, the name CI system know, so it can be automatically generated.
Q:kube-proxy, how's the performance? One more question is how do fixed ips for certain containers work?

A: We are small, no performance bottleneck, 1.2 (specifically not clear) after the kube-proxy is pure iptables implementation, not so bad bar, the industry is also useful haproxy, such as the replacement, the individual felt no need. Specific container fixed IP we did not implement, we do not have this scenario. You can compromise, give a nodeport, fixed IP I personally feel as little as possible to use as well.
Q: Can the self-description of the image unfold and talk about it?

A: Every image has a dockerfile that describes its build process.
Q: How is the dependency between services implemented for the container cloud platform you are using today? How to differentiate the environment? What are the options for applying health checks and tracking?

A: What does the dependency between services mean? If it's an app, they're still going through Dubbo, and if it's a non-Java application, call it through the service. We deployed kubernetes clusters in different environments, and each cluster also deployed a management system so that we know which environment each system corresponds to. The health check is through the Kubernetes Health examination mechanism realizes, Livenessprobe.
Q: Multi-data center disaster preparedness can you tell me, is there more than one set of clusters in multiple DCs, all of which are cold standby?

A: We produce three data centers, each time we publish, we will send a request to each data center, not cold, is more live.
Q: What are the details of the monitoring system and what are the monitoring contents, such as CPU memory, pod events, etc., including the alarm system?

A: This is a good question, we are doing very little in this area, we have just come up with a detailed plan for the monitoring standard. Our solution now is to output the CPU indicators to the log center (including the Monitoring Alarm section), which is done by the log center. Pod events and so on have not monitored the alarm.
Q: How does the log make it easy for components to view and troubleshoot issues such as Log on at startup?

A: The Application log is viewed through the log Center (ELK), and the boot log is viewed through the container cloud interface via the Kubernetes API interface.
Q: Many components have IP whitelist problem, and kubernetes cluster IP often transforms, how to solve?

A: Either reorganize the pieces or make restrictions on the network layer (such as Calico or others) and try not to resolve them at the kubernetes level.
Q: Is the container management platform self-developed? In what language did you develop it? Is it all based on API interface?

A: Is self-research, front desk angularjs, backstage Golang, all based on Kubernetes API, the development process is relatively simple, Kubernetes API design is very perfect, recommended try.
The above content is organized according to the April 25, 2017 night group sharing content. Share people Li Dawei, epro Payment Co., Ltd., architect, is mainly responsible for the construction and landing of the easy treasure container cloud, the DevOps platform construction and the concept promotion. Master of Peking University, 7 years work experience, solid theoretical foundation and years of experience in the bottom-up development. Open source enthusiasts, now focused on container technology and devops practices, have a strong interest in Docker, Kubernetes, DevOps, MicroServices, and more. Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.
Related Article
Large-Scale Price Reduction
  • 59% Max. and 23% Avg.
  • Price Reduction for Core Products
  • Price Reduction in Multiple Regions
undefined. /
Connect with us on Discord
  • Secure, anonymous group chat without disturbance
  • Stay updated on campaigns, new products, and more
  • Support for all your questions
undefined. /
Free Tier
  • Start free from ECS to Big Data
  • Get Started in 3 Simple Steps
  • Try ECS t5 1C1G
undefined. /

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.