Kubernetes Xu Chao "Kubernetes API for native and extended use of client-go control"

Source: Internet
Author: User
Tags add time etcd k8s prometheus monitoring value store
This is a creation in Article, where the information may have evolved or changed.

Hello everyone, I am Xu Chao, engaged in Kubernetes development has been more than two years.

Today, I talk about Client-go repository from a developer's point of view and how to build a Controller with Client-go. At the same time, we also give you a talk about the development process encountered in the pit, I hope everyone in the development of the time can be around the pit.

In addition, I will also talk about the Kubernetes API, so that the controller function becomes more powerful.

Now, let's start by talking about ways to communicate with Apiserver. The most commonly used, perhaps, is Kubectl, and the officially supported Ui,kube Dashboard, which is one of Google's most recently invested projects.

Debugging in the development process can be directly to call K8s's Restful API, by writing a script to implement the Controller.

However, these practices are not satisfactory either in terms of efficiency or programmability.

This is why we want to create client-go, we actually put the controller needed to write the clients,utilities and so on Client-go this repository inside. If you need to write a controller, you can find the tools you need in this area. Client-go is the client of the go language, and in addition to the go language, we now support Python's client, which is currently a beta version. But this Python client is generated directly from the open API spec and we will continue to generate client Java or some other language.

Let's look at the contents of the client library first. It mainly includes various clients:clientset, Dynamicclient and Restclient. There are utilities:workqueue and Informer to help you write your Controller.

Let's take a look at the approximate structure of the kube-controller, a typical controller typically has 1 or more informer to track a resource, communicate with Apiserver, and reflect the latest state to the local cache. As long as these resources change, informal will call callback. These callbacks just do some very simple preprocessing, filter out changes that don't care, and then put the changed Object of interest into workqueue. In fact, the real business logic is in the worker, the general 1 Controller will start a lot of goroutines run Workers, processing workqueue in the items. It calculates the difference between the state that the user wants to reach and the current state, and then sends a request to Apiserver via clients to drive the state evolution of the cluster to the user's requirements. The blue is the original client-go, red is the code that was filled in when you wrote the controller.

Let's take a closer look at the various clients.

First of all, the most common clientset, it is the highest rate of k8s in the client, the use is relatively simple. First select group, such as core, then select the specific resource, such as pod or job, and finally fill in the verb (create, get).

The use of Clientset is divided into two situations: within the cluster and outside the cluster.

In the cluster: After the controller is containerized, it runs in the cluster as a pod, just call rest. Inclusterconfig (), the default service accoutns can access all resources of Apiserver.

Outside the cluster, such as local, you can use the same kube-config as Kubectl to configure the clients. If you're on the cloud, like Gke, you'll need an import Auth Plugin.

The Clientset is generated with Client-gen. If you open pkg/api/v1/tyeps.go, there is a line of comments on the pod definition, called "+genclient=true", which means that you need to generate a client for this type, and if you want to do your own API type extension, The corresponding clients can also be generated in this way.

Clientset verbs a lot of subtle places to burn the brain, I come to help you understand.

Let's start with the Get getoptions, which is 1.6 of the feature, and if you look inside the get of the client, there is a field called Resource version.

Resourece version is a logical clock inside kubernetes, used as optimistic concurrency. If the resourceversion,api-server is not set, the latest value is read from ETCD when the request is received. However, when set to 0, Apiserver will read the value from the local cache, and the cache value may be somewhat delayed. This reduces the pressure on the back end of the Apiserver and ETCD. Now it is used more kubelet, often get node status, but do not need the latest node status, if the cluster is large, you can save a lot of cpu/memory overhead. If resource version is set to a very large value, the GET request will be suspended at Api-server and will time-out if there is no response.

Similarly, at the time of the list operation, you can provide a listoption, and this listoption also has resource version, the same as the meaning in get. We'll use it when we write informer. Because each controller will send a list request to Api-server when it is started, and if each request is read from the ETCD, the overhead is very large, so the list will be read from the Api-server cache.

There is also a listoption in watch, the meaning of Resrouce version is different. At watch, Apiserver will start all the changes from this Resuorce version. The best practice here is set to: Always set resource version. Because if not set, then the Apiserver will start from the cache random point in time to push, so that the controller's behavior is not predictable.

Let's see how Informer uses list for watch. In informer, we are generally first list, the resource version is set to 0,api Server can be from the cache to me list. After the list is finished, the list's resource version is taken out and set to watch's listoption, which ensures that informer gets the events in succession.

Also note that watch is not once and for all, Apiserver will timeout a watchrequest, the default value is 5-10 minutes. Then you need to watch again.

Say this update,client there are two kinds of update:update and updatestatus.

The difference is that if you Update a pod, your change to status will be overwrite by the API server. Uptatestatus is the opposite.

K8s has a optimisticconcurrency mechanism, if there are two clients in the same update, will fail. As a result, when writing the code, the update is typically written to the loop until Api-server returns to 200,ok to determine if the update was successful.

In addition, using Get+update has a bug: Suppose cluster pod has a new field, if you use an old client, it does not know this new field, then get to the pod is not this new field, and then UPDA Te, this new field will be overwritten.

You may be able to dispose of this bug at 1.7.

Patch is what corresponds to update. Update like the demolition team, will only push the whole object to redo. Patch is like a scalpel, can do fine operation, you can precisely modify an object field.

Patch If there is conflicts, it will retry 5 times in Apiserver. Unless there is a user patch in the same field, the general client will be a patch success. Of course patch has a performance problem, because to do Json serialiation and deserialization in API serve. We estimate that it will be optimized at 1.7. If you don't care about performance, we recommend patches.

A reminder: When you're making patches, the best practice is to fill in patches with the original UID. Because the Key-value store for API server is "namespace + name" as key. At any one time, this combination is unique. But if you add time to this axis, such as you have a pod, deleted after a while, and in the same namespace under the same name of the pod, but all the spec is changed, then the controller old patch may be applied to the new pod, This will cause a bug. If the UID is added to the patch, once it happens, Apiserver will think you are going to modify the UID, this is not allowed, so this patch will fail to prevent the bug.

Delete option, which has an option called precondition, has a UID option. It is also to prevent the combination of namespace+name from being unique on the timeline.

At that time, we found that k8s CI tests often inexplicably fail to fall. Finally, I found that the pod was the same as the pod that had been deleted before, but Kubelet did not know it and deleted the new pod accidentally. So when we delete, this precondition UID is not deleted.

Delete has a field called orphandependents starting from 1.4. If set to TRUE or unset, when delete () returns, the object may continue to exist for a while, although it will eventually be erased. Also, this time, if you set the orphandependents to TRUE or not set, the dependents to be deleted will not be deleted. If set to false, as long as Delete () returns, this object must have been deleted on apiserver unless you set up a different finalizer. and garbage collector will slowly remove dependents in the background.

Now let's talk about another client called the dynamic client.

Dynamic client usage is more flexible. Because you can set any resource you want to manipulate. It's return value, not a structure, but map[string]interface{}. If a controller needs to control all the APIs, such as namespace controller or garbage collector, then use dynamic client. You can use discovery to discover what APIs are available, and then use the dynamic Client Access API. The dynamic client also supports the third party resources.

The disadvantage of the dynamic client is that it supports only JSON serialization. And JSON is much less efficient than proto BUF.

Now let's talk about the rest client.

The Rest client is the foundation of the client and dynamic client. belongs to the lower level, like the dynamic client, you can use it to manipulate various resource. Supports do () and Doraw.

Supports PROTOBUF and JSON compared to the dynamic client. will be more efficient.

But the problem is, if you want Access third party resource, you need to write your own deserialization, not directly decode to type. A demo will be shown in the demo.

Now we talk about informer, its input is actually two, one is to list function and watch function, and the second is to give informer to provide some callback. Informer run up, it will maintain localstore. You can then access the Localstore directly, instead of communicating with Apiserver. Improve some performance.

Benefits of using informer one is that performance is better and one is reliability. If you have network Partition,informer, you will continue watch from the breakpoint, and it will not miss any event.

Informer also has some best practice, 1th, before controller run, it is better to wait for these Informer to sync (initialize). To do this, one can avoid the controller initialization of the churn: for example, replica set controller to watch replica set and pod, if not sync will start Run,controller will think now no Pod, you'll create a lot of unnecessary pods and then delete them. The other is to avoid a lot of weird bugs. I met a lot when I was writing garbage collector.

In addition, the LocalCache provided by Informer is read-only. If you want to modify, first with deepcopy copy out, otherwise there may be read-write race. And your cache may be shared with other controllers, and modifying the cache will affect other controllers.

The third thing to note is that the object passed to callbacks by informer is not necessarily the type you expect. For example, informer tracks all pods, and the returned Object may not be a pod, but a deletedfinalstateunknown. So in the process of delete, in addition to dealing with the original tracking object, but also to deal with Deletedfinalstateunknown.

The last thing to say is, Informer's resyncoption. It just periodically puts all the local cache stuff back into the FIFO. It's not to say that all the latest state of Apiserver on the list again. This option is generally not used, you can be assured to boldly put this resync period set to 0.

Finally, let's talk about this workqueue.

In fact, mainly in order to be able to concurrent processing, can be parallel to let callbacks add state to Workqueue, and then a lot of workers.

One of the guarantees provided by Workqueue is that, if it is the same object, it is added to the workqueue multiple times as if it were a pod, and it appears only once in Dequeue. Prevents the same object from being processed at the same time by more than one worker.

There are also some very useful feature in Workqueue. For example, rate Limited: If you take an object from the Workqueue, there is an error in handling it and re-put it back in Workqueue. At this point, workqueue guarantees that the object will not be immediately re-processed to prevent hot loops.

Another feature is to provide Prometheus monitoring. You can monitor queue length, latency and so on in real time. You can monitor whether the queue is processing faster.

Now I'll give you a demo (https://github.com/caesarxuchao/servicelookup). API users through K8s are not able to quickly find the corresponding service through the name of the pod. Of course you can find the label of this pod and then go compare it with selector to determine the service, but doing this reverse query is time-consuming.

So here I am writing a controller,watch of all the endpoints and pods, to do the right thing, to find the service of POD services.

I start with two informer,1 Informer is to track all pods changes and the other to track all endpoints changes.

To informer register callback, to put pod and endpoint changes to Workqueue.

Then start a lot of workers and get pods and endpoint out of Workquue.

When running, start the two informer, wait for their sync, and finally start the worker.

The Demo code is on GitHub, Https://github.com/caesarxuchao/servicelookup.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.