Docker source Analysis (vii): Docker container Network (top)

Last Update:2015-04-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Preface (What is Docker Container)

Today, Docker technology is a great way to go, and while you're trying and playing with Docker, you can't be without a concept of "container" or "Docker Container." So, first of all, let's look at the "container" or "Docker Container" from the perspective of implementation.

As you become familiar with Docker, you will be deeply impressed by the ease of deployment and operation of the application within the Docker container, as long as there are dockerfile, and the application of one-click deployment is definitely not a fantasy; Docker Applications running within the container can be controlled and isolated by resources, greatly meeting the requirements of cloud computing era applications. There is no doubt that these features of Docker are completely out of the question for traditional mode applications. However, behind these bright features, who in the end is "haunting", in the end, who can support these features of Docker? Do not know this time, whether people will associate with the powerful Linux kernel.

In fact, a large part of the functionality needs to be attributed to the Linux kernel. So let's take a look at the Linux kernel from the perspective of what Docker really is and start with Docker container. As for Docker Container, there must have been a two-point experience for developers who have experienced it: they can run applications (processes) and provide an isolated environment. Of course, the latter is certainly one of the reasons why industry calls it a "container".

Since Docker container can run processes internally, let's first look at the relationship between Docker container and processes, or the relationship between containers and processes. First, I would like to raise a question for you to consider whether the container can be separated from the process. In other words, can you create a container without any processes inside that container.

It can be said that the answer is negative. Since the answer is no, it means that it is impossible to have a container first, then there is a process, then the question comes again, "is the container and process born together, or is there a process before the container?" "It can be said that the latter is the answer. The reasons for this will be explained slowly.

Before explaining the reasons why the "container can be separated from the process", I believe you will not disagree with the following paragraph: A docker container created by Docker is a container that provides a running environment for process group isolation. The problem, then, is how the container is going to "isolate" the process group's running environment. At this point, it is the turn of the Linux kernel technology debut.

When it comes to the "isolation" of the operating environment, I am sure you will not be unfamiliar with Linux kernel features namespace and cgroup. Namespace is primarily responsible for the isolation of namespaces, while Cgroup is primarily responsible for resource usage limitations. In fact, it was the combination of these two magical kernel features that ensured the "isolation" of the Docker container. So what does namespace and Cgroup have to do with the process? The answer to the question can be explained in the following order:

(1) When the parent process creates a child process by fork, the namespace technology is used to implement the namespace isolation of the child process from other processes (including the parent process);

(2) After the child process is created, the Cgroup technology is used to process the subprocess, and the resource usage limit of the process is realized;

(3) The system in the sub-process namespace internal, create the required isolation environment, such as isolated network stack, etc.

(4) namespace and Cgroup Two technologies are used, the process of the "isolation" environment is really established, then the "container" is really born!

Analyzing the creation of containers from the perspective of the Linux kernel, the streamlined process is just 4 steps above, and these 4 steps also subtly illustrate the relationship between the two technologies and processes, namespace and cgroup, and the relationship between processes and containers. The relationship between a process and a container is naturally: the container cannot be separated from the process, there is a process, and then there is a container. However, it is often said that "use Docker to create Docker Container (containers) and then run processes inside the container". In this respect, it is understandable from an understandable point of view, since the existence of the word "container" is inherently more abstract. If you need a more accurate representation, you can: "Use Docker to create a process that creates an isolated environment for this process, which can be called a Docker Container (container), and then run the user application process inside the container." "Of course, the author's intention is not to negate many people's knowledge of Docker container or containers, but to explore with the reader the principles of Docker container's underlying technology.

After a more specific understanding of Docker container or containers, it is believed that our eyeballs will soon be positioned to both namespace and Cgroup technologies. These two technologies of the Linux kernel can play such a significant role, cannot help but Marvel. So here's a brief introduction to the two from the Docker container implementation process.

Let's start with the use of namespace in container creation, starting with the user creating and starting the container. When a user creates and starts a container, Docker Daemon will fork out the first process a in the container (called Process A, the child process of Docker Daemon). When Docker Daemon executes the fork, 5 parameter flags are passed in during the CLONE system call Clone_newns, Clone_newuts, CLONE_NEWIPC, Clone_newpid, and Clone_ NewNet (currently Docker 1.2.0 does not fully support user namespace). Clone system call Once these parameter flags are passed in, the child process will no longer share the same namespace (namespace) as the parent process, but rather the new namespace (namespace) is created by Linux to ensure that the child process uses an isolated environment with the parent process. In addition, if child process a again fork out child processes B and C, and Fork does not pass the corresponding namespace parameter flag, then sub-processes B and C will share the same command space (namespace) with a. If Docker daemon creates a Docker Container again, the first process in the container is D, and the D fork out the sub-processes E and F, the three processes will also be in another new namespace. The namespace of two containers differs from the namespace in which the Docker daemon resides. Docker's simple about namespace is as follows:

Figure 1.1 Docker in namespace

Again Cgroup, we all know that you can use Cgroup to control the resources of the process group. Unlike namespace, the use of cgroup is not done when a process in the container is created, but rather after the process in which the container is created, the cgroup is used so that the container process is in a resource-controlled state. In other words, the use of cgroup must wait until the first process in the container is actually created before it can be implemented. When the process inside the container is created, Docker daemon can learn the PID information of the process in the container, and then place the PID in the specified location of the Cgroup file system, making the corresponding resource limit.

It can be said that the Linux kernel namespace and Cgroup technology, realizes the resource isolation and the restriction. Is it necessary to configure other required resources for this isolated and constrained environment? The answer is yes, the network stack resource is added to the container at this time. When an isolated runtime environment is created for a container process, it is discovered that the container is already in an isolated network environment (the new networks namespace), but the process does not have a separate network stack to use, such as a separate network interface device. At this point, Docker Daemon will fully equip the Docker container with one by one of the resources it needs. Network, you need to configure Docker container corresponding network resources according to user-specified network mode.

2.Docker Container network analysis content arrangement

The Docker container Network chapter will analyze the ins and outs of Docker container network creation from the source point of view in the Docker container. The Docker container network creation process can be simplified as:

Figure 2.1 Docker Container Network creation flowchart

The main contents of the Docker container network analysis are the following 5 parts:

(1) The network mode of Docker container;

(2) Docker Client Configuration container network;

(3) Docker daemon create container network process;

(4) Execdriver network execution process;

(5) Libcontainer implementation of the kernel configuration network.

In the Docker container network creation process, the Networkdriver module use is not the focus, therefore the analysis content does not involve the networkdriver. Many readers here will surely have doubts. It is important to emphasize the role of Networkdriver in Docker: First, when creating a network environment for Docker Daemon, initialize the Docker Daemon network environment (see the Docker Source Analysis series sixth for details), such as creating Docker0 bridges, and second, assigning IP addresses to Docker container, port mapping for Docker container, and so on. There is very little content associated with Docker container network creation, and only one IP address is assigned to the Docker container network interface device in bridged mode.

This article is the Seventh--docker Container Network (above) of the Docker source Code Analysis series.

3.Docker Container Network mode

As mentioned above, Docker can create isolated network environments for Docker container, and Docker container uses private networks independently in an isolated network environment. It is believed that many Docker developers have also experienced the network features of Docker.

In fact, Docker also has the ability to create a shared network environment for Docker container in addition to creating isolated network environments for Docker container. In other words, when a developer needs Docker container to be isolated from a host or other container network, Docker can meet this requirement, and when developers need Docker container to share a network with a host or other container, Docker can also meet this requirement. In addition, Docker can not create a network environment for Docker container.

Summarizing the network of Docker container, you can draw 4 different modes: Bridge bridging mode, host mode, other container mode, and none mode. The following is a preliminary introduction to the different network modes in 4.

3.1 Bridge Bridging mode

The bridge bridging mode of Docker container can be said to be the most commonly used network mode for Docker developers today. Brdige Bridging mode creates a separate network stack for Docker container, ensuring that the process groups within the container use a separate network environment to isolate the network stack between containers, containers, and host hosts. In addition, Docker connects the network stack in the container with the host's network stack through the Network Bridge (DOCKER0) on the host, and realizes the container's network communication with the host and the outside world.

The bridge bridging mode for Docker container can be consulted:

Figure 3.1 Docker Container bridge bridging mode

Bridge bridging mode is mainly implemented as follows:

(1) Docker Daemon uses veth pair technology to create two virtual network interface devices on a host, assuming Veth0 and veth1. The characteristics of Veth pair technology can guarantee that no matter which Veth receives the network message, it will be reported to the other side.

(2) Docker Daemon attaches veth0 to the Docker0 bridge created by Docker Daemon. Ensure that the host network messages can be sent to Veth0;

(3) Docker Daemon adds veth1 to the namespace that the Docker container belongs to and is renamed Eth0. In this way, to ensure that the host network messages to Veth0, will be immediately received by eth0, the host to the Docker container network connectivity, but also ensure that the Docker container separate use of eth0 to achieve the isolation of the container network environment.

Bridge bridging mode, which realizes the network connectivity of Docker container to host and other machines in principle. However, because the host's IP address and veth pair IP address are not the same network segment, so only rely on Veth pair and namespace technology, is not enough to be host outside the network actively discover the existence of Docker container. In order for Docker container to allow the world outside of the host to perceive the services exposed within the container, Docker uses the NAT (network address translation, IP-addresses translation) approach, Allow the world outside the host to proactively send network messages to the inside of the container.

Specifically, when Docker container needs to expose the service, the internal service must listen for the container IP and port number port_0 so that the outside world can proactively initiate access requests. Because the host is outside the world, only know the host Eth0 network address, and do not know the IP address of the Docker container, even if you know the Docker container IP address, from the perspective of the two-tier network, the outside world can not directly through the Docker Container's IP address accesses the container's internal application. Therefore, Docker uses the NAT method to "bind" the port that the service listens to on the inside of the container to one of the host's port port_1.

As a result, the process of outside access to Docker container internal services is:

(1) External access to the host's IP and host port port_1;

(2) When the host receives such a request, due to the existence of the Dnat rule, the destination IP of the request (host eth0 IP) and the destination port port_1 are converted to the container IP and container port port_0;

(3) Because the host knows the container IP, it can send the request to veth pair;

(4) The veth0 of the Veth pair sends the request to the eth0 inside the container and ultimately to the internal service for processing.

Using the Dnat method, you can enable the world outside the Docker host to proactively access Docker container internal services. So how does Docker container access the world outside the host? The following is a brief analysis of the Docker container process for accessing a world outside of a host:

(1) The Docker container internal process learns the IP address and port port_2 of the service outside the host, and Docker container initiates the request. The container's independent network environment guarantees that the source IP address of the message in the request is the container IP (that is, the container internal eth0), and the Linux kernel automatically assigns an available source port to the process (assuming port_3);

(2) The request is sent through the internal eth0 of the container to the other end of the Veth pair, reaching Veth0, that is, to the Bridge (DOCKER0);

(3) The DOCKER0 Network Bridge opens the datagram forwarding function (/proc/sys/net/ipv4/ip_forward), so the request is sent to the eth0 of the host;

(4) The host processing request, using Snat to the request source address IP translation, the request is the source address IP (container IP address) converted to host eth0 IP address;

(5) The host will send the Snat converted message to the outside world via the requested destination IP address (the IP address of the global outside the host).

Here, many people will certainly ask: for the Docker container Internal initiative to initiate external network requests, when the request arrives at the host for Snat processing to the outside world, when the external response request, the destination IP address in the response message must be the Docker host IP address, When the response message comes back to the host, how is the host transferred to Docker container? With regard to such a response, the Port_3 port does not, in principle, be sent to the inside of the container because it does not have a corresponding Dnat conversion on the host. Why is it that the Dnat conversion is not done for such a response? The reason is very simple, the Dnat conversion is for the container internal service listening to the specific port, the port is for the service to listen to use, and the inside of the container initiated the request message, the source port number will certainly not occupy the service listening port, so the inside of the container initiates the request response will not be dnat processing on the host.

In fact, this part of the content is done by the iptables rules, the specific iptables rules are as follows:

Iptables-i forward-o docker0-m conntrack--ctstate related,established-j ACCEPT

This rule means that the network data message sent to the Docker0 Bridge on the host computer is unconditionally accepted if the connection to the data message has been established and sent by the Linux kernel to the original connection, that is, to the internal Docker container.

The above is a brief introduction to bridge bridging mode in Docker container. It can be said that the Bridger bridging mode achieves two aspects from a functional point of view: First, let the container have a separate, isolated network stack, and second, let the world outside the container and host to establish communication through NAT.

However, the Docker container in bridge bridging mode is not intended for developers when used. Most obviously, the Docker container does not have a public IP in this mode, that is, the eth0 of the host is not in the same network segment. The result is that the world outside the host cannot communicate directly with the container. Although NAT mode is implemented through intermediate processing, there are still problems and inconveniences in NAT mode, such as: containers need to compete on the host port, the container internal service visitors need to use the service discovery to learn the external port of the service, etc. In addition, the NAT mode is implemented on the three layer network, so it will certainly affect the transmission efficiency of the network.

3.2 Host Mode

The host mode in Docker container differs greatly from bridge bridging mode. The biggest difference is that the host mode does not create an isolated network environment for the container. This is called the host mode, because the Docker container in this mode will share the same network namespace with host hosts, so Docker container can use the host's eth0 as the host, and communicate with the outside world. In other words, the IP address of the Docker container is the IP address of the host eth0.

The host network mode of the Docker container can be consulted:

Figure 3.2 Docker Container Host network mode

The left-most Docker Container, which uses the host network mode, while the other two Docker Container still follow the Brdige bridging mode, the two modes exist on the host without contradiction.

The host network mode of the Docker container is not involved in the implementation of the DOCKER0 and Veth pair because it does not require additional bridges and virtual network cards. As mentioned earlier in namespace, when a parent process creates a child process, the child process that is created will share the same network namespace with the parent process if the Clone_newnet parameter flag is not used. Docker uses this simple principle, in the process of creating a process to start the container, there is no incoming clone_newnet parameter flag, implement Docker container and host to share the same network environment, that is, implement the host network mode.

It can be said that in the network mode of Docker container, the host mode is a good complement to bridge bridging mode. With the host mode Docker Container, you can use the host's IP address to communicate with the outside world directly, if the host's eth0 is a public IP, then the container also owns this public IP. Ports in the container service can also use the host's port without additional NAT translation. Of course, there is such a convenience, it will certainly lose some of the other features, most notably, the Docker container network environmental isolation weakening, that is, the container no longer has an isolated, independent network stack. In addition, the use of the host Mode Docker container can make the service and the traditional situation in the container no difference, no transformation, but because of the weakening of network isolation, the container will share the competition with the host network stack, in addition, the container will no longer have all the port resources, The reason is that some of the port resources have already been occupied by the host itself, and some ports have already been used for bridge network mode container port mappings.

3.3 Other Container mode

The other container network mode of Docker container is a more specific type of network in Docker. It is called "other Container mode" because the Docker Container in this mode uses the other container's network environment. This is called "special" because the network isolation of the container in this mode is between Bridge bridging mode and host mode. Docker Container shares the network environment of other containers, at least there is no network isolation between the two containers, and the two containers have network isolation from the host and other containers in addition.

The other container network modes of Docker container can be consulted:

Figure 3.3 Docker Container other Container network mode

The Docker Container on the right uses the other Container network mode, which is the network environment for the left Docker Container Brdige bridge mode.

The other container network mode of Docker container in the implementation process, does not involve the network bridge, also does not need to create the virtual network card Veth pair. Completing the creation of other container network mode requires only two steps:

(1) Find the network namespace of other container (that is, the container that needs to be shared network environment);

(2) namespace of the newly created Docker Container (also a container that needs to share other networks), using the namespace of other Container.

The other container network mode of Docker container can be used to better serve the communication between containers.

In this mode, Docker container can access other containers under namespace through localhost, which is more efficient to transfer. Although multiple containers share a network environment, the overall formation of multiple containers still forms a network isolation from the host and other containers. In addition, this model also saves a certain amount of network resources. However, it is important to note that it does not improve the container's communication with the world outside the host.

3.4 None Mode

The fourth network mode of Docker container is the none mode. As the name implies, the network environment is none, that is, not for Docker container any network environment. Once the Docker container uses the None network mode, the container can only use the loopback network device and no additional network resources.

It can be said that the None mode for Docker container do a very few network settings, but as the saying goes "Less is more", in the absence of network configuration, as Docker developers, in this foundation to do other unlimited possible network customization development. This also coincides with the opening of the Docker design concept.

Docker source Analysis (vii): Docker container Network (top)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Docker source Analysis (vii): Docker container Network (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Docker source Analysis (vii): Docker container Network (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support