Docker Bottom-up implementation

Last Update:2015-01-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic architecture

Docker uses the C/S architecture, which includes both client and server. Docker Daemon accepts requests from customers as a server and processes these requests (create, run, distribute containers). Both the client and the server can be run on a single machine or through a socket or RESTful API to communicate

Docker daemon typically runs in the background of the host host, waiting to receive messages from the client. The Docker client provides the user with a series of executable commands that the user uses to interact with the Docker daemon

Name space

Namespaces are a powerful feature of the Linux kernel. Each container has its own namespace, and the applications running in it are like running in a separate operating system. namespaces guarantee that the containers do not affect each other.

PID name Space

The process of different users is separated by the PID namespace, and the same PID can be used in different namespaces. The parent process for all LXC processes in Docker is the Docker process, each LXC process has a different namespace. Nested Docker containers can be implemented conveniently because nesting is allowed.

NET Namespace

With PID namespaces, the PID in each namespace can be isolated from each other, but the network port is also the port that shares the host. Network isolation is achieved through the net namespace, each net namespace has its own network device, IP address, routing table,/proc/net directory. This allows the network of each container to be isolated. Docker defaults to Veth to connect the virtual network card in the container with a Docker bridge Docker0 on the host.

IPC name space

The process interaction in the container is also based on the common inter-process interaction method (Interprocess COMMUNICATION-IPC) of Linux, including semaphores, message queues, and shared memory. Unlike VMS, however, the process interaction between containers is actually a process interaction in the same PID namespace as the host, so the namespace information needs to be added to the IPC resource request, with a unique 32-bit ID for each IPC resource.

MNT name Space

Similar to chroot, a process is placed into a specific directory for execution. The MNT namespace allows processes of different namespaces to see different file structures, so that the files directories seen by the processes in each namespace are isolated. Unlike chroot, the information in the/proc/mounts of a container in each namespace contains only the mount point of the namespace in which it resides.

UTS name Space

The UTS ("UNIX time-sharing System") namespace allows each container to have a separate hostname and domain name so that it can be viewed as a separate node on the network rather than a process on the host.

User name space

Each container can have a different user and group ID, which means that the user inside the container can execute the program rather than the user on the host

Control group

The control group (cgroups) is a feature of the Linux kernel, which is used to isolate, restrict, audit, and to share resources. Only the resources allocated to the container can be controlled to avoid competition for system resources when multiple containers are running concurrently.

Control group technology was first proposed by Google programmers since 2006, and the Linux kernel began to support 2.6.24.

Control groups can provide restrictions on the memory, CPU, disk IO, and other resources of the container and audit management

Federated file System

The Federated File System (UnionFS) is a layered, lightweight, and high-performance file system that supports the modification of the file system as a layer of overlay on a single commit, while simultaneously mounting different directories under the same virtual file system (unite several directories into A single virtual filesystem).

The federated file system is the basis for Docker mirroring. Mirroring can be inherited by layering, and it is possible to create a variety of specific application images based on the underlying image (without the parent image).

In addition, different Docker containers can share some of the underlying file system layers, plus their own unique layer of change, greatly improving the efficiency of storage.

The AUFS (ANOTHERUNIONFS) used in Docker is a federated file system. AUFS supports the ability to set read-only (readonly), read-write (ReadWrite), and write-out (whiteout-able) permissions for each member directory (such as a Git-like branch), while AUFS has a hierarchy-like concept, A branch that is read-only can be incrementally modified logically (without affecting the read-only portion).

The types of federated file systems currently supported by Docker include AUFS, Btrfs, VFS, and Devicemapper

Container format

Initially, Docker used the container format in the LXC. Since version 1.20, Docker has also started to support the new Libcontainer format and as a default option

Docker Network implementation

First, to achieve network communication, the machine needs at least one network interface (physical interface or virtual interface) to send and receive packets, in addition, if you want to communicate between different subnets, a routing mechanism is required.

The network interface in Docker is the virtual interface by default. One of the advantages of virtual interfaces is the high efficiency of forwarding. Linux implements data forwarding between virtual interfaces by replicating data in the kernel, and packets in the send cache of the sending interface are copied directly to the receive cache of the receiving interface. For local systems and in-container systems it looks like a normal Ethernet card, but it doesn't need to really communicate with external network devices much faster.

This technology is used by the Docker container network. It creates a virtual interface within the local host and container and makes them interconnected (such a pair of interfaces is called veth pair )

When Docker creates a container, the following actions are performed:

Create a pair of virtual interfaces, which are placed in the local host and the new container respectively;
The local host end is bridged to the default DOCKER0 or specified bridge and has a unique name, such as Veth65f9;
One end of the container is placed in a new container, and the name is modified as eth0, and this interface is only visible in the container's namespace;
From the bridge available address segment, get an idle address assigned to the container's eth0, and configure the default route to bridge the NIC Veth65f9.

Once this is done, the container can use the Eth0 virtual network card to connect to other containers and other networks.

You can docker run specify the network configuration of the container at the time of the --net parameter, with 4 optional values:

--net=bridgeThis is the default value, which is connected to the default network bridge.
--net=hostTell Docker not to put the container network in an isolated namespace, that is, do not container the network inside the container. At this point the container uses the local host's network, which has full local host interface access rights. The container process can open a low range of ports as well as other root processes of the host, access local network services such as D-bus, and allow the container to do things that affect the entire host system, such as restarting the host. So be very careful when using this option. If used further --privileged=true , the container is allowed to configure the host's network stack directly.
--net=container:NAME_or_IDLet Docker place the process of creating a new container in a network stack of existing containers, with its own file system, process list, and resource constraints, but will share network resources such as IP addresses and ports with existing containers, and the process can communicate directly through the lo loopback interface.
--net=noneLet Docker put the new container in the isolated network stack, but not network configuration. After that, users can configure themselves

Docker Bottom-up implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More