Kubernetes Environment Deployment Run CODIS

Last Update:2017-05-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Original intention

Why to build the CODIS system on the kubernetes, one reason is that many services have been running on the kubernetes, the Codis migrated to Kubernetes, can be more convenient to use; Another important reason is that, in combination with kubernetes features, Build a set of basic non-human operations involved in the CODIS system (if you are familiar with Codis, you will know that Codis need to participate in some of the heavy, abnormal situation basic need to participate in processing), the following will be described in detail if you do not need human participation, principle, implementation and some compromise choice.

Goal

In the kubernetes environment, a one-click deployment of a Codis system, the CODIS system can be run stably, and in some cases, even extreme (such as physical machine, etc.), can repair itself, and return to normal operation (after the deployment is completed, basically no longer need to participate in humans). One-click expansion of the CODIS (including proxy and backend server). After a lot of testing and running validation in the production environment, these goals are now basically met.

GitHub

Https://github.com/left2right/codis (based on Codis 3.2)

Depend on

Kubernetes Environment

Deploying a set of CODIS systems on kubernetes requires a basic kubernetes environment and Docker environment, and the specific deployment method is described in the documentation.
If you want to quickly deploy a single kubernets environment of your own testing or research, you can follow these two steps (MAC):

Deploying Docker
https://docs.docker.com/toolbox/toolbox_install_mac/
Deploying Minikube
Https://github.com/kubernetes/minikube (Brew cask Install minikube)
Kubernetes start-up and verification (Minikube)
```
###启动$ minikube start###验证$ minikube statusminikubeVM: Runninglocalkube: Running
```

Image dependency

Zookeeper Image:
Gcr.io/google_samples/k8szk:v1

Golang Image:
golang:1.7.5
These two image all need to turn over the wall to obtain, can find a domestic agent obtains.

Image get Success Check

### 检查zookeeper image$ docker images |grep k8szkgcr.io/google_samples/k8szk  v1  ... ...### 检查 golang image$ docker images |grep golanggolang  1.7.5  ... ...

Build steps

After the Kubernetes platform installation is successful and pulled to the above two image dependencies, you can build the CODIS system, as follows:

###获取codis源码$ git clone https://github.com/left2right/codis -b release3.2-k8s### 构建codis docker镜像$ cd codis$ docker build -f Dockerfile  -t codis-image .### 构建kubernetes环境的codis集群$ cd  kubernetes$ sh start.sh buildup... ... plz waitPONG #当你看到这个PONG的时候说明这套codis集群已经构建成功了，#如果你没有等到PONG，那就需要检查下具体的报错内容了，祝顺利～

constituent components

Document description

The following files are available in the codis/kubernetes/directory:

Readme.md use instructions, including creating Codis clusters, destroying clusters, scaling, etc., as follows:

### Build one codis cluster (codis master server has one slave)$ sh start.sh buildup### Clean up the codis cluster$ sh start.sh cleanup### Scale codis cluster proxy$ sh start.sh scale-proxy $(number)### Scale codis cluster server$ sh start.sh scale-server $(number)

start.sh Codis cluster operation script, specific functions such as the above readme.md description, quite simple script, if you want to do something change or test, you can do some on this basis to try ~
Codis-service.yaml Codis All of the service's YAML files in the kubernetes environment, including Codis-dashboard, Codis-proxy, Codis-server, Codis-fe, Codis-ha
Codis-dashboard.yaml Codis Dashboard in kuberenetes environment of YAML file
Codis-proxy.yaml Codis Proxy in the kuberenetes environment of YAML file
Codis-server.yaml Codis Server in the kuberenetes environment of YAML file
Codis-ha.yaml codis-ha in kuberenetes Environment of YAML file
Codis-fe.yaml Codis-fe in kuberenetes Environment of YAML file
Zookeeper/zookeeper-service.yaml Zookeeper Service in kubernetes environment Yaml file
Zookeeper/zookeeper-service.yaml Zookeeper in kubernetes environment Yaml file

Component Introduction

Through the above file introduction, you can see the main zookeeper (can be replaced with ETCD, etc.), Codis-dashboard, Codis-proxy, Codis-server, codis-ha, codis-fe these components, Each component has a corresponding service, and a specific pod organization implementation.

Zookeeper
Zookeeper is the reference Kubernetes official website https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/, Just for the convenience of the corresponding Yaml file volume mounts commented out (the production environment should be used in the zookeeper persisted data through the volume mounts mount), the specific use and related instructions see the official website introduction
Codis-dashboard
Codis-dashboard is Codis Dashboard, whose role in Codis is described in GitHub, Dashboard uses statefulset to organize pods, and replicas is set to 1, and the entire cluster is allowed only one. If the codis-dashboard is abnormal, it is closed by kubernetes and restarted and added to the cluster.
Dashboard start successfully, will pass Kubernetes Poststart Hook, send a command to check whether the zookeeper connection is normal, so as to facilitate the rapid positioning of the cause of the problem. When the dashboard is closed, the prestop hook is passed to ensure that the dashboard shuts down normally (the lock on the zookeeper is deleted).
When some extreme situations (such as a physical machine) cause the dashboard to shut down abnormally, the lock on the zookeeper is not removed (this will cause the dashboard to start up next), in order to enable the cluster to quickly return to normal state, We have added a clear lock switch (--remove-lock) for Codis-dashboard startup, and if the switch is turned on, the lock will be cleared one time before the dashboard lock is registered.
Codis-proxy
Codis-proxy is Codis proxy, because the proxy is stateless, the use of Replicationcontroller to organize the pod, so that the rapid expansion of the shrinking capacity. The number is set by replicas, which is 2 by default.
Codis Proxy sends a command to remove itself from the cluster before it is closed. If there are some anomalies, we will describe below, and the codis-ha will remove the proxy from the cluster. The kubernetes then restarts a proxy to join the cluster.
Codis-server
Codis-server is the Codis back-end Redis server, because Redis server is divided into groups, to facilitate management and easy to return to normal without the participation of people, using statefulset to organize pods, The number of replicas is the number of all codis-server in the entire cluster, CODIS-SERVER.YAML has a configured environment variable Server_replica, which is set by the number of codis-server in each group, The default is 2 (one Master one from).
In order for the CODIS server, especially a group server, not to be deployed on a physical node, we use the Kubernetes pod's anti-affinity (podantiaffinity), in order to successfully deploy CODIS clusters when the physical nodes are limited. We use Preferredduringschedulingignoredduringexecution. If you want to make all Codis server absolutely not on a node, you can set it to requiredduringschedulingignoredduringexecution, which will improve the system's high availability. However, if the node nodes are insufficient, the deployment of the Codis cluster fails or the capacity Codis server fails.
When Codis-server starts, it attempts to create a group for the server by Poststart Hook, joins the server to the group, and attempts to establish a master-slave relationship with the group's master. Try to remove the server from group when Codis server shuts down. The codis-server from which group it belongs to, is the ID assigned by Statefulset for the server, and the configured environment variable Server_replica, computed gid=$(expr $sid / ${SERVER_REPLICA} + 1) . Some operations in this process will fail, such as creating a group that already exists. Some operations fail and have no effect, skip directly, some failures will cause the system state is not normal, from the back of the codis-ha state to the normal state, see the following description.
To improve the system's high availability, you can mount the save Rdb file to the outside, load at startup, mount the directory is/codis, the specific mount mode and zookeeper similar, but considering the performance, the Rdb file is not mounted out by default, if for higher availability, You can refer to the Zookeeper.yaml files to mount them.
Codis-ha
The current CODIS official maintenance server ha use of the tool is Redis-sentinel, and remove the previous codis-ha, so the choice is because Codis-ha is a single point, if it is not normal, the high availability of the cluster is difficult to guarantee, Especially if codis-ha need to be involved in the situation, it will take a lot of time to return to normal, Redis-sentinel is the cluster mode, not because of a node is not normal, resulting in not working properly. However, we finally chose Codis-ha (modified on the basis of the original codis-ha) because Sentinel can achieve a limited number of functions, but only when the master state is abnormal, re-elect a master, but if the slave state is abnormal, and re-establish the master-slave relationship , it is clear that Sentinel will not be able to restore the status of each server in group and restore the proxy state to normal. We know that in the kubernetes environment if a pod state is not normal, will quickly shut down and restart, in this scenario, the Codis-ha front compared to Sentinel's shortcomings, to a certain extent, reduced a lot, the actual use of the effect can indeed.
Codis-ha strategy is, through the dashboard, check the status of proxy and server, if the proxy state exception to close the proxy, by kubernetes Restart Proxy, and join the cluster; If the server state is abnormal, in several cases: 1. Is master, and has slave, select a suitable slave (based on and master disconnect time), promoted to Master;2. is master, no slave, the direct shutdown restart; 3. Is slave, then shut down, then reboot, join the cluster.
Because Codis-ha relies heavily on codis-dashboard, we use pod affinity to deploy codis-ha and Codis-dashboard on the same node.
Codis-fe
Codis-fe is the Codis-fe,codis Web page graphical operation interface. PS in the kubernetes environment is mainly used to observe the cluster state, QPS and so on.

Source code Changes

Kubernetes environment of the CODIS to the source code made the following changes:

Add Kubernetes directory containing the corresponding kubernets yaml files and scripts
Modify the Codis-ha, maintain the cluster (server and proxy) state, and quickly restore the CODIS proxy and server to a normal state.
Modify the Codis-dashboard, add command line Product_Name, Product_auth to meet the creation of different product Codis, add remove-lock switch to make the exception case, fast recovery dashboard
Modify Codis-proxy, add command line Product_Name, Product_auth to meet the codis of creating different product

Watch out.

Precautions

Try to make sure that the codis-server replicas is divisible by Server_replica, the Server_replica is 1 through the Yaml file, and the other is not the same when you join the group, thinking you understand This is to avoid a deadlock. Regardless of whether the Server_replica is set to any positive integer, just make sure the group in the cluster does not appear to have slave, some of which are only master, divisible by ~
Keep an eye on your data and your machine's actual available memory ~
It is recommended that you use the Codis as a cache for a period of time until you are familiar with the content and have a better understanding of its stability (and perhaps much more about your actual environment), and consider it as a db of some data

Problem Locator

Codis-ha Log will respond in real-time to the entire cluster state, including Proxy,server, as well as HA and dashboard, because if HA has a problem log can not see, dashboard problems, Ha also hung off ... View methods
```
$ kubectl exec -it codis-ha-0 bash$ tail -f log/codis-ha-0.....
```

The dashboard log further locates the cause of the problem in a similar way

$ kubectl exec -it codis-dashboard-0 bash$ tail -f log/codis-dashboard-0.....

If you follow the above operation, especially after you successfully run CODIS, the test found any problems, welcome to communicate with Me (yqzhang@easemob.com) ~

Other

There are some things and thinking not finished, another day to add ~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kubernetes Environment Deployment Run CODIS

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support