Pursuit of Minimalism: The Evolution History of Docker image construction

Source: Internet
Author: User
Tags chmod prepare docker cp docker run

Author: bai, neusoft Internet operating platform technical director, graduated from Harbin University of Technology, go language expert, Gopherchina lecturer, technical trainer and contributor, blog tonybai.com author, has many years back-end service architecture design and development experience. Currently focused on Docker container and kubernetes research.
This article starts from "The programmer", declined to reprint, if needs to subscribe, please click here (Zebian/Wei Wei)

It has been more than four years since 2013, when the DotCloud company (now renamed Docker Inc.) released Docker container technology. During this period Docker technology developed rapidly, and spawned a vibrant, lightweight container technology based on the huge platform of the ecological circle. As one of the three core technologies of Docker, the image technology is very important in the rapid development of Docker: Mirroring allows the container to really plug into the wings, to achieve the container itself reuse and standardization of communication, so that the development, delivery, operation and maintenance of the various roles on the assembly line really around the same deliverable, "test What you to write, ship what test "becomes a reality.

Building Docker mirrors is a common occurrence for developers who have embraced and used Docker technology in their day-to-day development efforts. But how to build more efficiently and build a smaller size mirror is a common question for many Docker technology beginners, even those that some veterans have never considered carefully. This paper will explain the evolution history of Docker mirror construction from a docker user's angle, hoping to play a certain role. 1. Mirroring: Innovation in Inheritance

Before mirroring the construction, let's briefly describe the mirror image.

Docker technology is essentially not a new technology, but the existing technology has been better integrated and packaged. The kernel container technology was first seen on Sun's Solaris operating system in a complete form, and Solaris was the most advanced server operating system at the time. The Solaris Container technology was released by Sun in 2005, and the kernel container door was opened.

In the 2008, the Linux Container (i.e. LXC) feature, which was led by Google developers, was implemented in the Linux kernel. LXC is a kernel-level virtualization technology, mainly based on namespaces and cgroups Technology, to achieve the sharing of an operating system kernel under the premise of process resource isolation, providing a stand-alone virtual execution environment for the process, such a virtual execution environment is a container. Essentially, the LXC container is the same as the current Docker container. Docker is also based on namespaces and Cgroups technology. But Docker's innovation is that it defines a container packaging specification based on the Union file system technology, which encapsulates all of the applications in the container and all the dependencies it runs into a file in a particular format, which is called mirroring (that is, image). The principle is shown below (quoted from Docker official website):


Figure 1:docker Mirroring principle

Mirroring is the "serialization" standard for containers, which lays the foundation for storage, reuse, and transmission of containers, and the container mirrors "sit on the wheel" and spread to every corner of the world, helping the rapid development of container technology.

Unlike earlier kernel container technologies, such as Solaris Container and LXC, Docker also provides developers with a good set of tools to experience, including dockerfile for mirroring builds and a domain-specific language for writing Dockerfil. The standard method of constructing mirror image by Dockerfile method, which is repeatable, automatic, maintainable and layered and precise control, can not be compared with the traditional mirror with Docker commit. 2. "Mirror is a basket": Beginner's cognition

"Mirror is a basket, everything goes inside"-this quip may be a true portrayal of the initial perception of mirrors by most Docker beginners. Here we use an example to show it vividly.
We will now compile the Httpserver.go source file as a httpd program and publish it by mirroring. The contents of the source file are as follows:

Httpserver.go
Package main

import (
        "FMT"
        "Net/http"
)

func main () {
        fmt. Println ("http daemon start")
        FMT. Println ("  -> Listen on port:8080")
        http. Listenandserve (": 8080", nil)
}

Next, we'll write the dockerfile for building the target mirror:

Dockerfile from
ubuntu:14.04

RUN apt-get update \
      && apt-get install-y Software-properties-common \
      && add-apt-repository ppa:gophers/archive
      && apt-get Update \
      && apt-get install-y golang-1.9-go \
                                 git \
      && rm-rf/var/lib/apt/lists/*

ENV Gopath/root/go
env goroot/usr/lib/go-1.9
env path= "/usr/lib/go-1.9/bin:${path}"

COPY./ Httpserver.go/root/httpserver.go
RUN go build-o/root/httpd/root/httpserver.go \
      && chmod +x/root/ httpd

workdir/root
entrypoint ["/ROOT/HTTPD"]

To perform a mirrored build:

# docker build-t repodemo/httpd:latest.
//... Build output here omitted ...

# Docker Images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
repodemo/httpd                   latest              183dbef8eba6        2 minutes ago       550MB
ubuntu                           14.04               dea1945146b9        2 months ago        188MB

The build process for the entire image depends on the environment. If your network speed is general, this build process may take you more than 10 minutes or more. Eventually, as we would like, the container based on the repodemo/httpd:latest image will work:

# Docker Run repodemo/httpd
http daemon start
  -> listen on port:8080

A dockerfile produces a mirror image. The dockerfile consists of a number of command, and each command execution results in a separate layer (layer). Let's explore the mirrors that are built:

# Docker History 183dbef8eba6
IMAGE               CREATED             CREATED by                                      SIZE                COMMENT
183dbef8eba6        21 Minutes ago      /bin/sh-c # (NOP)  entrypoint ["/root/httpd"]   0B
27aa721c6f6b        ago      / Bin/sh-c # (NOP) workdir/root                 0B
a9d968c704f7        minutes ago      /bin/sh-c Go build-o/root/httpd/root   /h ... 6.14MB ...
aef7700a9036        minutes ago      /bin/sh-c apt-get update       && apt-get ...   356MB .....
<missing>           2 months ago        /bin/sh-c # (NOP) ADD file:8f997234193c2f5 ...   188MB

Let's get rid of those size 0 or very small layer, we see three size layer, see figure below:

Figure 2:docker Image layered Exploration

Although the Docker engine utilizes caching mechanisms to allow for a very rapid implementation of a mirror build that is not the first in the same host, the idea of a Docker technology that makes Docker mirroring a storage and transmission advantage is all but a ubuntu-server 16.04 of the virtual machine ISO file size is just over 600 MB. 3. "The Return of Reason": The rise of the builder model

Docker the "rational return" after the enthusiasm "cooling" in the early stages of new technology contact. Based on the diagram of the layered mirrors above, we find that the final mirror contains a build environment that is redundant, and we only need to include a running environment that is sufficient to support httpd operations in the final mirror, and base image itself can be satisfied. So we should remove the unnecessary middle layer:


Figure 3: Removing unnecessary hierarchies

Now the question is coming. If the application build is not completed in the same mirror, where and by whom is the application built? There are at least two ways to build locally and copy into a mirror, built with a builder image (builder image).

However, Method 1 local build has many limitations, such as: The local environment can not be reused, not well integrated into the continuous integration/continuous delivery pipeline. While building with builder image has become a best practice for the Docker community, Docker officials have also launched the official base image of a variety of mainstream programming languages, including Go, Java, Nodejs, Python, and Ruby. The process principle of mirroring construction with builder image is as follows:

Figure 4: Flowchart for mirroring construction with builder image

Through the schematic diagram, we can see that the entire target image construction is divided into two phases: the first stage: build the compiler responsible for compiling the source image; The second stage: the first phase of the output as input, build the final target image.

We select golang:1.9.2 as the builder base image, and the dockerfile.build of the builder mirroring is as follows:

Dockerfile.build from
golang:1.9.2

workdir/go/src
COPY./httpserver.go.

RUN go build-o httpd./httpserver.go

To perform a build:

# docker Build-t repodemo/httpd-builder:latest-f Dockerfile.build.

With the built application httpd placed in the/GO/SRC directory in the mirrored repodemo/httpd-builder, we need some "glue" commands to connect the two build phases, which are httpd taken out of the builder's mirror and built as the next stage:

# docker Create--name extract-httpserver repodemo/httpd-builder
# docker CP extract-httpserver:/go/src/httpd. httpd
# docker rm-f extract-httpserver
# docker RMI Repodemo/httpd-builder

With the above command, we copy the compiled httpd program to the local. The following is the dockerfile of the target mirror:

Dockerfile.target from
ubuntu:14.04

COPY./httpd/root/httpd
RUN chmod +x/root/httpd

workdir/root
entrypoint ["/ROOT/HTTPD"]

Next we'll build the target image:

# docker Build-t repodemo/httpd:latest-f Dockerfile.target.

Let's take a look at the "physique" of this image:

# Docker Images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
repodemo/httpd                   latest              e3d009d6e919        seconds ago      200MB

200MB. The target mirror size drops to 1/2 more than the original. 4. "Subtract all unnecessary things like a car": The pursuit of minimal mirroring

The size of the mirrors we built earlier has shrunk to 200MB, but that's not enough. 200MB's "physique" in our network environment caching and transmission is still very difficult to satisfy. We need to further reduce the image, reduce to as small as possible, like the car, in order to reduce the weight of all unnecessary things are removed: we only retain the necessary to support the operation of our application of the required libraries, commands, the rest are not included in the target mirror. Of course, not only the size of the reasons, small mirrors there are additional benefits, such as: small memory footprint, start faster, more efficient, not because of other unnecessary tools, library vulnerabilities are attacked, reduce the "attack surface", more secure and so on.


Figure 5: Can the target image be smaller?

General application developers do not build their base image and target mirrors from scratch mirroring, and developers pick the right base image. The presence of some "fly-level" or even "grass-grade" Official base image provides a condition for this situation.

Figure 6: Size comparison for some base image (from Imagelayers.io screenshot)

From the graph, we can have two choices: BusyBox and Alpine.

The busybox is smaller on the size of the mirror alone. However, the BusyBox default LIBC implementation is UCLIBC, and the LIBC implementation we normally use for the environment is glibc, so we either choose to statically compile the program or use the BUSYBOX:GLIBC mirror as base image.

and alpine image is another kind of fly level base image, which uses a smaller and more secure MUSL libc library than glibc. However, compared with busybox image, Alpine image volume is still slightly larger. In addition to the fact that MUSL is larger than UCLIBC, Alpine has added its own package management system apk to mirrors, and developers can use APK to add the required packages or tools to alpine based mirrors. Therefore, for ordinary developers, Alpine image is a better choice. However, the libc implemented by Alpine is MUSL and incompatible with the compiled applications based on GLIBC. If you plug the previously constructed httpd application into Alpine, you will encounter the following error when the container starts, because the loader cannot find the dynamic shared library file glibc:

Standard_init_linux.go:185:exec user process caused "No such file or directory"

For go applications, we can use statically compiled programs, but once statically compiled, it means that we will lose some of the native capabilities provided by libc, such as: On Linux, you cannot use the DNS resolution capabilities provided by the system, you can only use the DNS parser of Go self implementation.

We can also use the builder Image,golang base image based on Alpine to provide a Alpine version. Next, we build a minimal target image based on Alpine base image in this way.

Figure 7: Flowchart for mirroring construction with Alpine builder image

We created two new dockerfile for Alpine version of the target mirror build:

Dockerfile.build.alpine and Dockerfile.target.alpine:
//dockerfile.build.alpine from
golang:alpine

Workdir/go/src
COPY./httpserver.go.

RUN go build-o httpd./httpserver.go

//Dockerfile.target.alpine from
Alpine

COPY./HTTPD/ROOT/HTTPD
RUN chmod +x/root/httpd

workdir/root
entrypoint ["/ROOT/HTTPD"]

Building Builder Mirrors:

#  Docker build-t repodemo/httpd-alpine-builder:latest-f Dockerfile.build.alpine.

# Docker Images
REPOSITORY                       TAG                 IMAGE ID            CREATED              SIZE
repodemo/httpd-alpine-builder    latest              d5b5f8813d77 about        a minute ago   275MB

To perform the glue command:

# docker Create--name extract-httpserver repodemo/httpd-alpine-builder
# docker CP extract-httpserver:/go/src/ httpd./httpd
# docker rm-f extract-httpserver
# docker RMI Repodemo/httpd-alpine-builder

To build a target image:

# docker Build-t repodemo/httpd-alpine-f Dockerfile.target.alpine.

# Docker Images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
repodemo/httpd-alpine            Latest              895de7f785dd        seconds ago      16.2MB

16.2MB. The size of the target mirror drops to less than one-tenth, and we get the expected result. 5. "To have light, so there is light": Support for multi-stage construction

At this point, although we have achieved the goal of minimizing the image, but the entire construction process is very cumbersome, we need to prepare two dockerfile, need to prepare "glue" command, need to clean the intermediate products and so on. As a docker user, we wanted to solve all the problems with a single dockerfile, so we had the Docker engine support for the multi-stage build (multi-stage builds). Note: This feature is very new and can only be supported by Docker 17.05.0-ce and later versions.

Now we will merge the above Dockerfile.build.alpine and Dockerfile.target.alpine into one dockerfile according to the "multi-Stage Build" syntax:

Dockerfile from

golang:alpine as builder

workdir/go/src
COPY httpserver.go.

RUN go build-o httpd./httpserver.go from

alpine:latest

workdir/root/
COPY--from=builder/go/src/httpd. C7/>run chmod +x/root/httpd

entrypoint ["/ROOT/HTTPD"]

Dockerfile's grammar is very concise and easy to understand, even if you first see this syntax can guess roughly 60% meaning. The biggest difference from previous dockefile is that in Dockerfile that support multi-stage builds we can write multiple "from baseimage" statements, each from statement opens a build phase, and can be named for this phase through the "as" syntax ( such as the builder here). We can also pass the Copy command to transfer data between the two stages of building the product, such as the httpd application here, which we used "glue" code to do before this work.

To build a target image:

# docker build-t Repodemo/httpd-multi-stage.

# Docker Images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
repodemo/httpd-multi-stage       Latest              35e494aa5c6f        2 minutes ago       16.2MB

We see that the Docker image built over the multi-stage build feature is equivalent to the mirror image we built earlier through the builder pattern. 6. Come to the reality

Along the trajectory of time, Docker mirror construction has come to this day. The pursuit of fast and small mirrors has become the consensus of the Docker community. The community has finally ushered in a multi-stage construction of this tool after the best practice of creating a builder mirror image, from which it will be no longer difficult to build a minimalist image.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.