This is a creation in Article, where the information may have evolved or changed.
"Editor's word" This share includes the following 4 areas:
- SAP anywhere product features and development team profile What's the problem--cd need to solve
- The composition of the basic components of Jenkins, Docker, KUBERNETES--CD
- The implementation of Jenkins Pipeline, automated testing, automated deployment--CD
- Logistic support for troubleshooting, diagnosis, monitoring and analysis of--CD
SAP anywhere product features and development team profile What's the problem--cd need to solve
Hello, I am a system architect from the SME division of SAP China Research Institute, currently engaged in the CI/CD work of SAP anywhere products. Today's sharing theme is the implementation of the CD behind the SAP anywhere product. We have just started in this area, share content is quite omission, please forgive me.
First, allow me to use a few words to describe the product features of SAP anywhere and what issues the CD needs to address. Our products SAP anywhere is a cloud-based ERP SaaS solution, the product UI is a pure HTML5 implementation of the WebUI and hybrid mode Mobileui, after driven by close to 30 micro-service composition, The outward unified exposure through the api-gateway.
As a result of large companies, our development team is scattered around the world, with a total of more than 200 people submitting code, with an average of more than 300 submissions per day. CI to solve the problem is to provide a stable and efficient platform for each developer of each commit can get immediate artifacts, thereby blocking those "harmful" submissions. When the code passes through the CI auto-artifacts and tests, it is merged into the main branch, and the next step is to automate the setup of a demo system for further testing by the development and testing staff.
The challenge in building a demo system is a microservices architecture – Deploying a Boulder Service (Monolithic) is much simpler than deploying a bunch of distributed "microservices". In fact, the problem is compounded by the introduction of dependencies between services and discovery of services.
To solve these problems, we introduced a series of kubernetes such as Jenkins Pipeline, and so on to help us build our CD system.
The composition of the basic components of Jenkins, Docker, KUBERNETES--CD
Let's introduce the basic parts that make up the CD. We have built multiple sets of kubernetes clusters for running Jenkins, running CI slow-test, and deploying demo environments. So our foundation builds are Docker, Jenkins and Kubernetes. In addition to the Jenkins cluster using a physical machine, all other machines are all vsphere virtual machines. In addition, we have built clusters on AWS.
Physical machines are used for better IO performance (thus accelerating component processes), while virtual machines are used for more convenient management capabilities. Here is a typical place to think about balance.
Operating system we chose CoreOS and a small amount of Ubuntu for build machine. The network scheme is commonly used flannel overlay network,backend is Vxlan.
The Kubernetes admission controller uses the recommended ServiceAccount, Limitranger and Namespacelifecycle. Addon chose Dashboard, Heapster, Grafana monitoring and Skydns.
In fact, we're forcing each microservices to use at least the HTTP protocol to provide the rest API, and its port must be 80, so we can make mutual discovery between services simple and unified – using Skydns as a service discovery! (for example, to get all the non-invoiced tickets can call Http://invoice:80/api/v1/Invoices?status=open, query the customer information by region call Http://customer:80/api/v1/Customers?) LOCATION=CN, of course, port number 80 can be omitted)
Unlike the zookeeper solution, we do not use the service registration mechanism to realize service discovery, which is not innovation, but just another way of thinking.
Jenkins Pipeline, automated testing, automated deployment--CD implementation
Off the topic, we continue to talk about CDs. To make every developer's submission fast and accurate through CI artifacts and automated testing, we've built our own CD Pipeline on the Jenkins 2.0 Pipeline plugin to validate each of our submissions.
, each bar represents the complete process of a code submission. The core steps are build & Fast Test and Slow-test. The former is responsible for component and unit testing (typically MVN install, gradle build, gulp build), which is responsible for building a "semi-complete" environment to perform "integration testing". Failure of any one step will result in the current submission being rejected. We have set up a corresponding CD Pipeline for more than 60 git repo, one of which is a micro-service named productivity.
The last step of the CD pipeline post-build inside, we will trigger a demo system build based on the current submission. It is usually possible to deploy a complete demo system in about 10 minutes and deliver it to the development testers. In this way, we can significantly shorten our development iteration cycle.
Logistic support for troubleshooting, diagnosis, monitoring and analysis of--CD
All systems are monitored, especially distributed systems. To help developers quickly locate problems in the demo system, we used the Elk system to recycle and present the micro-service logs. At the same time, we put a lot of log files into the Hadoop system for big data analysis to optimize the product itself.
We also use the Kubernetes native Health check mechanism to monitor the state of the service, which means that each microservices must provide an HTTP REST API, such as/health, to inform them of their health. When/health returns a non-200 status code, the service is killed and restarted.
Typically, a 30-node cluster can run 20 sets of demo systems, and we use Kubernetes namespace to isolate each system. Before mentioning Limitranger, we assign a default resource limits for each namespace, For example, executing each container requires at least 64M of memory and 0.2 logical CPUs, and each container can consume up to 1G of memory and one logical CPU. Since the use of Limitranger, our cluster stability has greatly improved.
After a long period of operation and observation, we find that the Docker daemon on the nodes will die from time to times. It behaves as cannot create destroy container, cannot docker PS. To alleviate this problem, we used kubernetes Daemonset to develop the watchdog run on each node, and then violently restart the node whenever a daemon death was encountered.
Q&a
Q1: I would like to know if you develop Daemonset can use supervisor this to replace, if you need to pay attention to what, before the monitoring Web encountered Web service 200, the actual Web services have been suspended, how to monitor similar events?
A:daemonset is a function of kubernetes to dispatch pods to all nodes that meet the criteria. Our usage scenario is to dispatch the watchdog program to the nodes in each cluster to monitor the function.
Q: What types of applications do you mainly run?
A: Our business is based on Web-ui ERP products.
Q: kubernetes Use the Health check HTTP request when there is no container may itself need a long time to start, then this person mechanism will always die loop ... Does this have a good way to solve?
A:kubernetes Health Check is divided into Readinessprobe and livenessprobe. The former determines whether the container is ready, and the latter when the check fails to kill the container directly. Some containers start slowly, so you need to set a wait time. Our container waiting time is readiness->15s, liveness->2min.
Q: Later, we press the function to split the original cluster into multiple, operational dimension, so that the stability greatly improved, this how to dismantle it?
A: According to function disassembly. For example a cluster dedicated to running Jenkins, another cluster dedicated to running Slow-test. BTW, Slow-test is a test used to further verify the existence of a bug in the current commit, because the running time is longer, so called Slow-test.
q:kubernetes API server often goes down? To the size limit? API server can do multiple load balancing, API server downtime when scheduler and Controller-manager to the bottleneck?
A: Early outages may be our own improper use, such as the API server to each developer (we have more than 200 people), and then everyone to play kubectl various commands, there are several commands to eat resources, one is Port-forward, the other is watch Kubectl get pod.
Q: How do you manage your app's configuration?
A: That's a good question. Our internal discussions are quite intense. At present there are two extremes, one of which tends to manage the configuration of each service itself, and the other is inclined to provide a specialized configuration microservices. Each has its own pros and cons.
Q: Does a demo system include dependent services? If you rely on, then if the demo system in a number of services need to publish, to each of a set of demo system?
A: The previous article mentioned that we have nearly 30 micro services, yes, each set of demo system contains so many micro services, so do a set of demo system is very "heavy". Of course, when we build the demo system, we mainly use the Kubernetes scheduling ability to handle tasks in parallel.
does the Q:web UI application involve load balancing to introduce?
A: We use Nginx as the reverse proxy, Kubernetes's deployment/rs as load balancer. Nginx Upstrea/server writes the internal domain name of the service, is parsed into the cluster IP by Skydns, and is then routed to the real pod by the IP tables round robin.
Q: "Detach according to function." For example a cluster dedicated to running Jenkins, another cluster dedicated to running Slow-test. Btw,slow-test is a test used to further verify the existence of a bug in the current commit, because the running time is longer, so called Slow-test ", you originally mixed in a cluster of AH?
A: Yes, we were running in a cluster in the early days. At that time we worship kubernetes, that Google master-level products will be able to use the heterogeneous system to form a cluster, and good management of the application inside. The fact is more regrettable, of course, do not rule out our own code problems, in short run together more unstable, and encountered a failure is more difficult to recover (because the impact of the larger surface).
Q: Hello, I would like to know if you develop Daemonset can use supervisor such to replace, if you need to pay attention to what, before the monitoring Web encountered Web service 200, the actual Web services have been suspended, how to monitor similar events?
A: Add the answer to the first question: monitoring Web services, which requires the service to expose (implement) a special API, usually/heath, and then service itself in the API to complete the health scan. Then k8s will know if the service is in zombie state.
Q: Does your kubernetes API have https turned on? If there is a case about the Pod access API?
A: In fact, we switched on HTTPS, but in most cases we still use the HTTP port to connect to API server. Our next goal is to fully enable the trust-granting security model. As long as the pod accesses the API server, the service token is available.
Q: Hello, will Jenkins and slow tests split the cluster according to the name space, or the separate deployment of another set of kubernetes?
A: I said the split refers to the physical split, into two kubernetes cluster. In this regard we are still groping, in the end is multiple small clusters easy to manage or a large cluster easy to manage.
Q: Have you considered using Docker's official swarm? Novice, do not know which to choose.
A:kubernetes and Docker swarm are inherently two denominations, a bit "positive and evil not coarseness" flavor. In fact, we thought about mesosphere when we were early prototypes. Personal opinion: Judging from the current development, it is difficult for kubernetes and swarm to go together, which means choosing one of them will inevitably give up another. And Kubernetes is Google's main push, considering that Google's internal Borg system is kubernetes prototype, running every day millions application, should not be wrong.
The above content is organized according to the July 28, 2016 night group sharing content. Share people
Chen Yun Zhe (Miles Chen), Senior development engineer, full stack engineer, System architect. More than ten years of employment, currently in SAP China Research Institute, as an architect, responsible for SAP anywhere products CI/CD work。 Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.