What is the difference between Borg and Kubernetes? What cloud needs in the future?

Source: Internet
Author: User

What is the difference between Borg and Kubernetes? What cloud needs in the future?

Hello everyone, I am Zhong Cheng from the Huawei PaaS department. I am currently working on some related products. The topics I want to share are from Borg to Kubernetes. In fact, Borg is the predecessor of Kubernetes. I will talk about three main aspects today. The first is Borg's introduction, and the second is what Kubernetes has changed based on Borg and its development direction, the third topic is about what kind of products or forms the cloud may need in the future.

What is Borg? What problems does it solve?

Let's first look at the first topic: What is Borg? What problems does it solve?

Let's take a look at this picture. This picture comes from a movie called Star Trek. I believe most people have seen it. Borg is an alien in it. What does it do? He is in contact with other civilizations and preempties your civilization. Then, it will contract with you, transform you into a monster with half a person and half a machine, you become part of their civilization, and then he continues to expand in the universe. I think this is a cool race. Borg named its large-scale distributed integrated management system. He hoped that their systems could also turn different machines into their own machines and then run their own programs.

For Google, Borg is a relatively top-level integrated management system. Most of Google's applications and frameworks include Gmail, Google Docs, and Web Search. It also includes some underlying frameworks (MR), including some of its GFS distributed storage systems. In other words, you can think that all applications need to use it to manage the underlying physical machines. Borg has been successfully applied to Google for more than 10 years.

Let's take a look at the overall Borg architecture. It is also a typical distributed platform architecture, that is, a logical master, and then there are a lot of nodes, each node has some of its proxies. Here, let's take a look at how to use this system as an engineer in Google? They use Borgcfg (command line) or Web UI to submit the application (Task) to the system. A Task can be anything, for example, an application, or a batch processing task or an MR task. He submitted the task to BorgMaster through Borgcfg, and then BorgMaster will accept the request, put the request into the queue, and Scheduler will scan the queue to view the resource requirements of the application, and search for matching machines in the cluster. The task you actually submit, the resources required, the machines you need, and the matching process, we can see which machines are idle at the underlying layer. Then, assign the task to the machine and start running.

This is the overall Borg framework. A typical startup time is 25 seconds from application submission to application startup. 80% of the time is the time when the application package is downloaded on each node. We can see that this time is very fast, and its scheduling time is less than 5 seconds, of which 20 seconds is consumed on the transmission layer.

BorgMaster Scheduling Principle

I will discuss it later. One of the key points I want to talk about today is the BorgMaster, which has many applications. How can it mobilize the priority of this application? Or which machine should run what applications? Borg's practice is to estimate the resources of a single Task. We can see that there are several lines here, the outermost dotted line on one machine is a quota of resources submitted by the user, that is, the running of tasks cannot exceed its limit, which is a hard limitation, if the limit is exceeded, it will be restricted to prevent it from running. This is only the number submitted by the user. We all know that the number submitted by the user is often inaccurate. You cannot estimate how much CPU your program consumes on your system, memory usage. How does Borg evaluate this resource? After the Task is started for 300 seconds, a resource is recycled. You can see that there is a yellow area in the middle, and the yellow area is the actual resources consumed by this application. Then it will push it from the outside, push it to the green area, and draw a line to the green area. This line is the so-called resource retention, the Borg system considers your application as a required resource for long-term stable operation.

Here is a question. Why does Borg need to do this? The reason is to leave resources in the remaining regions blank. If I know that this application actually uses so many resources. After I draw a security line for it, I can schedule the remaining resources, that is, it can be used by other applications.

The green one is composed of a yellow block and some security zones. The amount of resources consumed by the application is recalculated every few seconds. This is actually a dynamic process, and it does not mean that after the transfer, it can no longer change. The green square can be expanded to the range of the dotted line outside. This is a policy for a single Task. Then, he makes a distinction between the applications running on the system. That is to say, he first thought about what applications are available and what features these applications have. One type of application is the so-called production application, that is, the prod task, which is characterized by never stopping. It is a long process and is always user-oriented, for example, Gmail or Web Search cannot be broken in the middle. Its response time ranges from several microseconds to several hundred milliseconds. Then, this type of task means that you must give priority to its operation. Its short-term performance fluctuations are sensitive.

Another type of task is the so-called non-prod task. It is a batch processing task, similar to Map Reduce. It is not directly oriented to users and is not very sensitive to performance, when the task is finished, the next task is the next task, not a long process.

Why differentiate tasks?

When prod tasks consume a large amount of resources, for example, when many people suddenly come to a website, the server memory CPU of this website will be very high. At this time, when the application resources on this machine are insufficient, it will kill the Non-prod task and then let it run on other machines. However, when you are idle, You can resume the task. In this way, I can make full use of the resources at all the time points on this machine, and fill these things in full. Finally, Google's test results show that about 20% of the workload can be recycled. This data is actually very big. If Google has so many machines, you can save 20% of the resources, which is a lot of money.

Borg value

Here I will summarize the value that Borg provides to Google. It mainly provides three aspects. The first is the hidden details of resource management and troubleshooting, allowing users to focus on application development. The user does not have to worry about how the underlying system operates. Even if I am down, it will help me start up. The second is to provide high-reliability and high-availability operations and support applications for High-reliability and high-availability. The third is to run high resource utilization on tens of thousands of machines.

Google has a long paper titled using Borg for large-scale cluster management. There are many details in this article.

Kubernetes Architecture

Since Google launched the Borg system, it has been very successful internally. However, in the community outside, we do not know how this is actually done, I don't know how it is implemented internally. Later, the Borg team made another software, which is Kubernetes. In general, you can think of it as an open-source version of Borg, however, there are some differences between Kubernetes and Borg. I will give a rough description later. This is the Kubernetes architecture. As you can see, this architecture is basically similar to the Borg architecture, including how users use it. You can use kubectl as a command line tool to submit a task.

Differences between Kubernetes and Borg

Borg has been running on Google for ten years and has a large number of machines. It has 10 thousand or more machines in one cluster. Kubernetes came out in 2014. I personally think it is very successful for Amazon. Google also wants to enter this field, his approach is to launch the Kubernetes system open-source, which has a certain influence in the industry and can be used by everyone. In this way, we can compete with Amazon in the future. This is one of my personal ideas.

Borg uses the lxc container while Kubernetes uses the Docker container. Borg is written in C ++ and Kubernetes is written in Go language. Borg has done a lot of Optimization on the cluster Scheduling Performance, and Kubernetes has not done a lot of optimization yet. At present, it is quite earthy in this regard, and there is still a lot of work to be done later. A single Borg cluster can schedule tens of thousands of machines, while Kubernetes currently only supports hundreds of machines. This is the current data.

Then let's look at the differences between the two systems. Borg users are actually a group of Google engineers. As we all know, Google engineers are the world's top engineers. When they write this program, they have considered that the program will run on the cloud, he knows that this program is distributed. When writing this application, he will make a lot of Optimizations to this system. When designing it, he will know that I should build a distributed system. However, Kubernetes wants to do more, that is, in addition to running these distributed systems, it also wants to support some containers. First, it supports Docker containers, however, he also hopes to support applications written by more traditional, more popular, and technically skilled people. He has done some work in this area. One is using a Docker container, which supports many things. In addition, it can also mount the external persistent layer, that is, you can mount some distributed systems on that system. My containers read external distributed storage. In this case, even if my container crashes, my data can be saved securely. In addition, it provides some monitoring and log functions. But are these functions enough? There are still some questions. If I want to use Kubernetes to run some traditional applications in the future, I will certainly make some changes to these applications and systems, but at least it is not difficult to complete.

This is a feature of its Kubernetes design. The Kubernetes network architecture is that each Pod has a separate IP address. Such applications are more friendly. Users who write applications do not need to consider conflicts. Also, it refers to how containers are grouped. Borg is a relatively Expert System with over 230 parameters, but Kubernetes is very simple. It is about three or four descriptive files.

Visual Summary of Borg and Kubernetes

Here is a visual Summary of Borg and Kubernetes. Borg is a jet driving system. It is very professional and high. It applies to big companies like Google, which has millions of machines. Kubernetes is a simplified version of Kubernetes, which is a well-designed car. It is suitable for small and medium-sized companies to use it to schedule their clusters.

In the future, Kubernetes will also do some work, including multi-tenant support, including container persistence, cluster Scale improvement, utilization and network.

What cloud needs in the future?

Finally, I personally think about it. What will the cloud in the future need. We can see that since the popularity of computers, there have been a lot of systems, a lot of software, wave after wave, some of the systems or software are relatively successful, can survive for a long time, for example, some systems such as Java, C, or Windows are very unfortunate, such as Cobol, DOS, or Minix, which are slowly abandoned, slowly forgotten, and finally turned into an abandoned parking lot.

I would like to consider that if we talk about some of the technologies we are using over the next 10 years, will they enter the left or right? I personally want to enter the ranks on the left. After all, we still hope that he can become a classic product. At least for these systems, we will be very proud. We have made a classic product that can be used for a long time.

If we want to achieve this, we have to face a dilemma facing the entire computer system, or a dilemma facing our cluster management system. What is this dilemma? This is the dilemma of the Babel Tower shown here. In the city of Israel, this is a story of the Bible. People want to create a tower of common heaven, And they want to challenge God's authority. When I saw these mortals, God thought that these birds dared to challenge my authority, and he invented various languages. People who work together in the babetas use different languages, they could not communicate, and the tower could not be created.

In fact, this is also true in the computer world. Everyone uses their own languages and frameworks, and finally makes our cooperation very complicated. Our cluster management systems, including other systems, are actually helping us bridge this gap and help us all better cooperate. However, at present, there is no good solution for everyone to cooperate very well. I think this is what we need to do in this system. Let's take a look at Lao Tzu's words here.

There are thirty centers in total. When there is no hub, there will be a car.

Zookeeper is a tool. It is used when there is no tool.

The hacker thinks that the room is used when there is no room.

Therefore, it is useful and useless.

I was wondering what determines whether a system or software can survive for a long time? I think it is very important for him to know what he wants to do. In fact, it is the water in the front of the water, sweeping the floor, that is to say, you not only need to know what your software wants to do, but also know what you do not want to do. You can't do anything. I don't want to do anything in this field. If you do everything, it will be weak in the end, and it will be easy to subvert or be replaced by others. If you do one thing well, you will at least be irreplaceable in this field and can survive for a long time. I remember some time ago someone asked Linus how to look at Docker containers. Then he said that I don't care about this shit container, So I care about my kernel. Don't ask me this question. I think this is a very good attitude for him. He has done a good job of his own kernel module, and he has done a good job of his system. So for him, he can continue his work for a long time.

So for us, a little more detailed, that is, the situation we encounter in software development is like this. From our software design to development to testing and production, there are many and repetitive processes. At the same time, it is very difficult to schedule most cluster systems. In my opinion, we need to solve several problems later. I think this is a big direction for us.

Can our products in the future reduce the complexity caused by different languages, programs, and frameworks, simplify the process, simplify the language, and simplify the network and service dependencies, this is another question I raised.

Kubernetes cluster deployment

OpenStack, Kubernetes, and Mesos

Problems encountered during Kubernetes cluster construction and Solutions

For details about Kubernetes, click here
Kubernetes: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.